Re: evaluating routing when the document collection is reused
>> CHRIS:
>> While doing extensive relevance feedback experiments, I gave up
>> on doing anything other than "residual collection". There were
>> just too many evaluation artifacts when doing anything else
>
>> So I would be against A, C, and D on those grounds.
>
> STEPHEN R: I'm inclined to agree with Chris on A, C, D.
Steve, OK, from this and the rest of your message it sounds like we have
Steve Robertson : B, E > A, C, D
and from other messages:
Stephen Tomlinson: B > A,C,D,E
Chris Buckley: B > E > A,C,D
Dave Lewis: A,B > E > D > C
So B (residual collection) is looking like the choice unless someone
has new thoughts.
>> CHRIS:
>> We also have the problem of duplicate (and near duplicate)
>> documents. For most (but not all) of last year's judgements, if
>> documents were close to each other in similarity, they would both
>> be judged or not judged. So this problem will have a bigger
>> impact on E than B (residual collection) since B will not
>> evaluate on either of a duplicate pair if they both were judged,
>> but E might use one for training and one for testing. Thus you
>> have an artificial effect with E.
Good point - this is a serious problem with E.
> STEVE R: I agree that this is an issue. Do we know how serious a
> problem it is in this collection?
Very serious. In the pools there were cases of 50 or more near-
duplicate documents, and I know that in the collection there's at
least a few hundred copies of some documents.
>>> E. "Test and Control": Splitting entire document set into
>>> training and test halves. Only assessed documents that fall into
>>> the
:
:
>>> training. This would have the disadvantage of only improving
>>> judgments on the test half of the collection, though one could do
>>> this twice, exchanging the roles of the training and test halves.
>
> I like the idea of doing it twice. But one possible issue: do you
> imagine that any of the participants will try manual query
> formulation for routing? The two-fold replication might be
> difficult in that case -- couldn't really have the same user doing
> the formulation both ways round.
Yet another problem with E. OK, E is DOA.
Dave
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov