Re: questions about 2007 Legal Track routing & interassessor (in)consistency
- Subject: Re: questions about 2007 Legal Track routing & interassessor (in)consistency
- From: "Dave Lewis (address for public mailing lists)" <misclists1@daviddlewis.com>
- Date: Tue, 13 Feb 2007 16:27:00 -0600
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
- DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; h=X-Originating-IP:In-Reply-To:References:Mime-Version:Content-Type:Message-Id:Content-Transfer-Encoding:From:Subject:Date:To:X-Mailer; s=default; d=daviddlewis.com; b=XNaQ5hUL6aIFDE4CQn0ShZZgktHXE0VWL9h+HUQs0s0NBWvBuiOehQ13dqUKYCtBnbcvdkk6CRFqHJ8Eqegh2YjR6nS6SmouwcANhbfRvymoO7+WOcQ755aRZktynmnD1YfAbMAyX7lGOEu0UZ/iAGQrUP/HZy52e8nMnGy4EqY=;
- In-Reply-To: <200702021355.l12DtGwj000821@pc5.sabir.com>
- References: <200702021355.l12DtGwj000821@pc5.sabir.com>
Picking up this discussion again...
>> As we've been discussing, interassessor consistency is bad and/or
>> strange for many 2006 Legal Track ad hoc topics. That raises
>> 1. Should we omit some of these topics from the 2007 Legal Track
>> routing evaluation
Chris said:
> I think that dropping the number of topics will have a bigger
> negative impact on the reliability of the results.
I'm inclined to agree unless, as you indicated, we just don't have
the assessment budget to do a reasonable job on all of them. Keeping
all the topics is particularly desirable given the next point:
>> levels of interassessor inconsistency. So how should the final qrels
>> for the collection be produced:
>>
>> 2a. Go ahead and take the union?
>>
>> 2b. Distribute two alternate sets of qrels with the collection,
>> one based on 2006 ad hoc and one based on 2007 routing?
>>
>> 2c. Distribute only the 2007 routing qrels?
>>
>> 2d. Take the union of the 2006 ad hoc relevant with the 2007
>> routing relevant and nonrelevant (a kind of maximally broad
>> definition of relevance).
Chris:
> My preference is 2b, without some concrete reason to do 2a. A
I agree with your case for this. A researcher can always combine the
two sets themselves.
>> 3. If the answer to 2 is 2a, 2b, or 2d, should qrels from 2006 ad hoc
>> Assessor 2 be thrown in as well?
Chris:
> Strongly against this. They're non-random partial judgements.
On reflection, my question doesn't make much sense in the context of
2b, so this is moot.
>> 4. If the answer to 2 is 2b or 2c, should we have the relevant from
>> 2006 ad hoc thrown into the 2007 routing pools to be reassessed, as a
>> particularly rich source of relevant documents.
>
Chris:
> Not because they're a rich source of relevant documents, no. I
> see no major reason why we should regard the 2006 judgements as
> "bad" judgements. The "lawyers being legalistic" argument is a
> reason; I don't think it's major, but I don't think we
> know. However, that's a reason to include the NON-relevant
> documents to be reassessed, not the relevant documents.
>
> Given infinite resources, I have no objection to replacing the
> 2006 judgements. I'm sure we can improve them. But we don't
> have infinite resources.
Actually, I wasn't talking about replacing the 2006 judgments, which
I agree we should keep. I meant having those documents found
relevant in 2006 for a topic treated as a kind of super-manual-expert
run contributing to the pool to be assessed by the 2007 assessor.
Of course, if we do residual collection evaluation for the 2007
participant runs, these documents will not be impact the scores of
2007 routing participant runs. They will only be useful for future
ad hoc research on the collection. But there's only about 4600 of
these documents total, so assuming similar pool sizes as last year
they would add less than 15% to the number of documents to be assessed.
>> 5. Do the answers to the above questions change the best strategy for
>> evaluating routing. In particular if we adopt 2c (with either answer
>> to 4), is residual collection evaluation still necessary?
>
> Yes, I'm not sure what has changed. The issue is artificial
> evaluation effects on the routing task. Those still exist
> whether or not the 2006 judgements are released. Eg, the
> duplicate document problem remains a problem.
I agree with this, though since we'll be evaluating 2007 participants
with 2007 assessments only, the evaluation effects would not be as
bad as usual.
>> 6. Should additional studies of interassessor consistency be built
>> into the routing evaluation? If so, what? Should we keep the
>> option open of omitting some 2006 routing topics from the final
>> collection?
>
> Additional studies are a function of resources available. If
> we're going to change the topics to help consistency in the
> ad hoc 2007, I think we need most of the resources there.
Good point. Probably the consistency studies should focus on the 2007
ad hoc task.
> Thanks for bringing up all these points so thoroughly, Dave!
>
> Chris
And thanks for you help!
Dave
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov