Re: interassessor consistency data on TREC 06 Legal track ad hoc topics
"Dave Lewis (address for public mailing lists)"
<misclists1@daviddlewis.com> writes:
> There is one systematic factor that's less benign that I worried a
> little about. Most of the people who played the role of Assessor 2
> on one or more topics had also played the role of Assessor 1 on some
> other topic. Since their experience as Assessor 1 was usually that
> the proportion of relevant was very small, I wonder if they carried
> that over to their judgments in the role of Assessor 2. We could
> have avoided this by taking a random sample from the pool, instead of
> 25 relevant and 25 nonrelevant, but then for most topics we'd then
> end up with most dual assessments having been on "easy" nonrelevant
> documents.
Maybe no cure for that in this data, but next time around you can try
to hold back a larger budget for secondary assessment. In the
enterprise track last year we were able to give the second assessor a
10% random sample of the pool which was large enough that nearly all
topics had (round-1) relevant documents in the sample, and nearly all
had at least one relevant by the second assessor as well.
This of course depends on your expected rate of occurrence of relevant
documents in the pool (which by itself is fraught with factors) but
IIRC there were enough relevant docs in the legal track topics to make
it not so bad.
The bonus is that it's a lot easier to measure effectiveness using a
uniform random sample of the pool, than a funky sample of the pool.
For a ranked run, anyway.
Ian
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov