Re: NDCG
Thank you everyone for your terrific comments. Thanks also to those
folks who sent some data off-list for testing.
I'm finding something that is worrisome, which is that in the literature
at this point there are quite a few variations of NDCG in use. The
first is the log2(rank) vs. log2(rank + 1) discounting which we've been
discussing here a bit.
The second is a gain function which is 2^{rel}-1, where rel is some
relevance value. This gain function seems to come from MSR papers,
nearly all of which are working with collections that have relevance
values on a five point scale. (But not all... one MSR paper used
OHSUMED (!) and this same function; another used a five-point scale with
1 as the lowest value, so that all retrieved documents contribute some
gain.)
This is of course in contrast to other papers which either use raw
relevance values as gains, or otherwise draw gain values from whole
cloth. I think Tetsuya is the person who's done the most looking at
whether the choice in gain and discount affect things, and of course Kal
has pointed out rightly that these choices should be made to reflect a
real user model, however that is best done.
At any rate, this has helped me tremendously in my quest to achieve
comparable NDCG values and compare them to existing work.
I do remain somewhat worried that perhaps putting NDCG into trec_eval is
the wrong thing to do -- people will just use a default setting of gains
and this is probably the wrong thing to do in nearly every case. At
least everyone will use the right math and sort their runs correctly
before scoring them, but that's not a very strong motivation.
Is there a strong desire out there for a standardized NDCG
implementation? How should it work?
(For reference, my current approach is to let the user specify literal
gain values for each relevance value on the command line. The default
is to use the relevance values as gains; this means that if some
collection had "standard" gains, they could be easily encoded in the
qrels. I think this is the most general approach but it is prone to
user error.)
Ian
- Follow-Ups:
- Re: NDCG
- From: Ian Soboroff <ian.soboroff@nist.gov>
- References:
- RE: NDCG
- From: Nick Craswell <nickcr@microsoft.com>
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov