questions about 2007 Legal Track routing & interassessor (in)consistency
- Subject: questions about 2007 Legal Track routing & interassessor (in)consistency
- From: Dave Lewis (address for public mailing lists) <misclists1@daviddlewis.com>
- Date: Thu, 1 Feb 2007 13:33:33 -0600
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
- DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; h=X-Originating-IP:Mime-Version:Content-Type:Message-Id:Content-Transfer-Encoding:From:Subject:Date:To:X-Mailer; s=default; d=daviddlewis.com; b=fEb7m5PnH43yXCcSj3dWYhlSPOmM4kAmzhv8f12pCzuftL1MPvxn8Jt2JXGddq9A8qLSgWQDKXOakP6GaKhUHq5SypuP0u85QuNd0Dfp1kJ24S437Wvcj+WxuH0oBJ0pH84krclRBt6uNdtfQCNlS14TVUMjXroqz+V51+rN7Kg=;
As we've been discussing, interassessor consistency is bad and/or
strange for many 2006 Legal Track ad hoc topics. That raises
several questions:
1. Should we omit some of these topics from the 2007 Legal Track
routing evaluation because we already have evidence that they are
susceptible to high interassessor inconsistency? This would probably
mean the same topics should be dropped from the test collection
entirely, since the 2006 ad hoc pool is of questionable quality (only
6 participants, many technical problems). If so, what should the
test be for dropping a topic.
2. I had envisioned that, independent of the style of routing
evaluation (e.g. residual collection), that the union of 2006 ad hoc
qrels and the 2007 routing qrels would be used as the qrels for these
topics going forward. But maybe this is unrealistic given the high
levels of interassessor inconsistency. So how should the final qrels
for the collection be produced:
2a. Go ahead and take the union?
2b. Distribute two alternate sets of qrels with the collection,
one based on 2006 ad hoc and one based on 2007 routing?
2c. Distribute only the 2007 routing qrels?
2d. Take the union of the 2006 ad hoc relevant with the 2007
routing relevant and nonrelevant (a kind of maximally broad
definition of relevance).
3. If the answer to 2 is 2a, 2b, or 2d, should qrels from 2006 ad hoc
Assessor 2 be thrown in as well?
4. If the answer to 2 is 2b or 2c, should we have the relevant from
2006 ad hoc thrown into the 2007 routing pools to be reassessed, as a
particularly rich source of relevant documents.
5. Do the answers to the above questions change the best strategy for
evaluating routing. In particular if we adopt 2c (with either answer
to 4), is residual collection evaluation still necessary?
6. Should additional studies of interassessor consistency be built
into the routing evaluation? If so, what? Should we keep the
option open of omitting some 2006 routing topics from the final
collection?
Dave
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov