Re: evaluating routing when the document collection is reused



>> CHRIS:
>> While doing extensive relevance feedback experiments, I gave up
>> on doing anything other than "residual collection".  There were
>> just too many evaluation artifacts when doing anything else
>
>> So I would be against A, C, and D on those grounds.
>
> STEPHEN R: I'm inclined to agree with Chris on A, C, D.

Steve, OK, from this and the rest of your message it sounds like we have

    Steve Robertson : B, E > A, C, D

and from other messages:

    Stephen Tomlinson: B > A,C,D,E

    Chris Buckley: B > E > A,C,D

    Dave Lewis:   A,B > E > D > C

So B (residual collection) is looking like the choice unless someone  
has new thoughts.

>> CHRIS:
>> We also have the problem of duplicate (and near duplicate)
>> documents. For most (but not all) of last year's judgements, if
>> documents were close to each other in similarity, they would both
>> be judged or not judged.  So this problem will have a bigger
>> impact on E than B (residual collection) since B will not
>> evaluate on either of a duplicate pair if they both were judged,
>> but E might use one for training and one for testing. Thus you
>> have an artificial effect with E.

Good point - this is a serious problem with E.

> STEVE R: I agree that this is an issue.  Do we know how serious a  
> problem it is in this collection?

Very serious.  In the pools there were cases of 50 or more near- 
duplicate documents, and I know that in the collection there's at  
least a few hundred copies of some documents.

>>>        E. "Test and Control": Splitting entire document set into
>>> training and test halves.  Only assessed documents that fall into  
>>> the
:
:
>>> training.  This would have the disadvantage of only improving
>>> judgments on the test half of the collection, though one could do
>>> this twice, exchanging the roles of the training and test halves.
>
> I like the idea of doing it twice.  But one possible issue:  do you  
> imagine that any of the participants will try manual query  
> formulation for routing?  The two-fold replication might be  
> difficult in that case -- couldn't really have the same user doing  
> the formulation both ways round.

Yet another problem with E.  OK, E is DOA.

Dave






Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov