Re: questions about 2007 Legal Track routing & interassessor (in)consistency



Picking up this discussion again...

>> As we've been discussing, interassessor consistency is bad and/or
>> strange for many 2006 Legal Track ad hoc topics.   That raises


>> 1.  Should we omit some of these topics from the 2007 Legal Track
>> routing evaluation

Chris said:
> I think that dropping the number of topics will have a bigger
> negative impact on the reliability of the results.

I'm inclined to agree unless, as you indicated, we just don't have  
the assessment budget to do a reasonable job on all of them.  Keeping  
all the topics is particularly desirable given the next point:


>> levels of interassessor inconsistency.  So how should the final qrels
>> for the collection be produced:
>>
>>      2a. Go ahead and take the union?
>>
>>      2b.  Distribute two alternate sets of qrels with the collection,
>> one based on 2006 ad hoc and one based on 2007 routing?
>>
>>      2c. Distribute only the 2007 routing qrels?
>>
>>      2d. Take the union of the 2006 ad hoc relevant with the 2007
>> routing relevant and nonrelevant (a kind of maximally broad
>> definition of relevance).

Chris:
> My preference is 2b, without some concrete reason to do 2a.  A

I agree with your case for this.  A researcher can always combine the  
two sets themselves.

>> 3. If the answer to 2 is 2a, 2b, or 2d, should qrels from 2006 ad hoc
>> Assessor 2 be thrown in as well?

Chris:
> Strongly against this.  They're non-random partial judgements.

On reflection, my question doesn't make much sense in the context of  
2b, so this is moot.


>> 4. If the answer to 2 is 2b or 2c, should we have the relevant from
>> 2006 ad hoc thrown into the 2007 routing pools to be reassessed, as a
>> particularly rich source of relevant documents.
>

Chris:
> Not because they're a rich source of relevant documents, no.  I
> see no major reason why we should regard the 2006 judgements as
> "bad" judgements. The "lawyers being legalistic" argument is a
> reason; I don't think it's major, but I don't think we
> know. However, that's a reason to include the NON-relevant
> documents to be reassessed, not the relevant documents.
>
> Given infinite resources, I have no objection to replacing the
> 2006 judgements.  I'm sure we can improve them.  But we don't
> have infinite resources.

Actually, I wasn't talking about replacing the 2006 judgments, which  
I agree we should keep.   I meant having those documents found  
relevant in 2006 for a topic treated as a kind of super-manual-expert  
run contributing to the pool to be assessed by the 2007 assessor.

Of course, if we do residual collection evaluation for the 2007  
participant runs, these documents will not be impact the scores of  
2007 routing participant runs.  They will only be useful for future  
ad hoc research on the collection.   But there's only about 4600 of  
these documents total, so assuming similar pool sizes as last year  
they would add less than 15% to the number of documents to be assessed.


>> 5. Do the answers to the above questions change the best strategy for
>> evaluating routing.  In particular if we adopt 2c (with either answer
>> to 4), is residual collection evaluation still necessary?
>
> Yes, I'm not sure what has changed.  The issue is artificial
> evaluation effects on the routing task.  Those still exist
> whether or not the 2006 judgements are released.  Eg, the
> duplicate document problem remains a problem.

I agree with this, though since we'll be evaluating 2007 participants  
with 2007 assessments only, the evaluation effects would not be as  
bad as usual.

>> 6. Should additional studies of interassessor consistency be built
>> into the routing evaluation?  If so, what?   Should we keep the
>> option open of omitting some 2006 routing topics from the final
>> collection?
>
> Additional studies are a function of resources available.  If
> we're going to change the topics to help consistency in the
> ad hoc 2007, I think we need most of the resources there.

Good point. Probably the consistency studies should focus on the 2007  
ad hoc task.

> Thanks for bringing up all these points so thoroughly, Dave!
>
> Chris

And thanks for you help!

Dave




Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov