Re: evaluation of known item searches


Thijs,

Since there is many kind of method to judge the equality between a fragment and a known-item,
NIST chosen method is the following (see link 'first evaluation' to ISIS works in Shot Boundary html pages
http://www-nlpir.nist.gov/projects/t01v/shots.html): A comparison by sufficient overlapping:

The comparison is based on 2 parameters RMIN and RMAX used to define the conditions under which it is accepted that 2 fragments are not perfectly
aligned. "The criterion is based on a ``sufficient'' overlapping of the two corresponding segments. The length of the common portion must be at least RMIN
times the length of the longest segment and RMAX times the length of the shortest segment. Default values for RMIN and RMAX are respectively 0.333333
and 0.499999."
A graphic interpretation would be in the extreme case:
-------------------[----------|----------|----------]-----     longest segment
--------[----------|----------]----------                      shortest segment

Ramazan.
--

Thijs Westerveld wrote:

 Dear all,

In the video track guidelines it is said that the known-item searches will be automatically compared to reference derived from known-items identified in the topics. The measure will be average precision - the mean of the precision obtained after each known item is retrieved, using zero as the precision for known items not retrieved.

However, I don't think we've had a discussion on how to automatically compare search results to known items. When does a returned fragment match the known item? Do we have to have the exact same start and stop times, a certain amount of overlap, or something else?  To start this discussion, we made an evaluation script to score known-item searches (it's in perl, see attachment).

In deciding whether a fragment is relevant or not, we only look at its start time. We think, in a realistic setting, someone using a search system on a video archive will not stop watching a fragment just because the system thinks it's no longer relevant. Therefore, we disregard the stop time of the returned fragments.

Now, we think a fragment is relevant if its starting point lies between the start and stop times of one of the known items. In addition, we allow fragments to start just before the start of the known item.

-------------------|----------------------------------|----------
               KI-start                           KI-stop

       <--deltaT-->
      |_______________________________________________|
       <--------- relevant starting points ---------->
 

Feel free to experiment with the script and with other matching functions. Comments on our matching conditions and other suggestions are of course welcome.

Thijs Westerveld


#!/Utils/bin/perl # # Name:   trecvideval # # Author: Thijs Westerveld # Date:   25/07/2001 # # To evaluate results on Known Item topics for TREC's video track #  # usage: cat <results.xml> | trecvideval <topics.xml> # # trecvideval takes one or more trecvid topics and processes a set of results. # # Both topics and resulst should be in the official video track xml formats: # http://www-nlpir.nist.gov/projects/t01v/topics/videoTopic.dtd # http://www-nlpir.nist.gov/projects/t01v/topics/videoTopics.dtd # http://www-nlpir.nist.gov/projects/t01v/videoSearchResults.dtd # # The output of the script is a filtered version of <results.xml>, in which for each topic only the  # relevant results are shown (i.e. the results that match an known item). # # In addition for each topic, the average precision is reported (the mean of the precision obtained after each known item is retrieved, using zero as the precision for known items not retrieved). # Finally, at the end, these precision values are averaged over all known-item topics. # # A fragment is regarded relevant if its start point lies between the start and stop times of one of the known items # In addition, we allow fragments to start just before the known item. # # # ------------------------------|----------------------------------|-------------------------------------- #                            KI-start                           KI-stop # #                   <--deltaT--> #                  |_______________________________________________|      #                   <--------- relevant starting points ----------> # # The tolerance at the starting point can be varied by changing the value of deltaT.  #  # #  Note: Stop times of fragments are disregarded, because we think someone using a video search system will not #          stop watching a retrieved fragment just because the system thinks it's no longer relevant.  #          A user will stop watching either when he/she has found a what was searched for or  #          when he/she gets bored and thinks there's nothing there. # die "usage: cat <results> | trecvideval <topics.xml>\n" unless ($#ARGV == 0); $deltaT = 1; # tolerance at starting point (in seconds) %knownItems=(); $numKItopics=0; $sumavgprec=0; open(TOPICS, "$ARGV[0]")|| die "Cannot open $ARGV[0]\n"; # Read known items from topics descriptions while(<TOPICS>){   if (/videoTopic num/){     ($topic) = /num=\"(\d+)\"/;   }   if(/knownItem.*src/){     $knownItems{$topic}.="$_";   } } while(<STDIN>){   if(/VideoSearchResult tNum/){     print;     # This is a new Topic, so get KnownItems     @ki=();     ($topic)=/tNum=\"(\d+)\"/;     if(exists $knownItems{$topic}){   #  If there are known items for this topic, then read them into a list       $i=0;       foreach $item (split (/\n/, $knownItems{$topic})){           $hour=$min=$sec=$milsec=0;           # known item source           ($ki[$i]{src}) = ($item =~ m/src=\"([^\"]+)(.mpe?g)?\"/i);           $ki[$i]{src} =~ s/\..*$//;           # known item start time            ($start)=($item =~ m/start=\"([^\"]+)\"/);           ($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($start =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);           $start=3600*$hour+60*$min+$sec+0.001*$milsec;           $ki[$i]{start} = $start;           # known item stop time           ($stop)=($item =~ m/stop=\"([^\"]+)\"/);           ($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($stop =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);           $stop=3600*$hour+60*$min+$sec+0.001*$milsec;           $ki[$i]{stop} = $stop;           # next item           $i++;                 }       print "<!-- $i known-Items for this Topic -->\n";       # Initialise number of found Known Items and score       $numfound=0;       $score=0;     }     else {       # No known Items for this topic, so clear knownitem list       @ki=();       print "<!-- No known-Items for this Topic -->\n";     }   }   elsif(/item seqNum/){     # This is a result     if (@ki !=() ){       # If we there are known items,        #     then extract result info and compare to known items       $result=$_;       $hour=$min=$sec=$milsec=0;       #result rank       ($rank) = ($result =~ m/seqNum=\"([^\"]+)\"/);       # result source       ($src) = ($result =~ m/src=\"([^\"]+)(.mpe?g)?\"/i);       $src =~ s/\..*$//;       # result start time        ($start)=($result =~ m/start=\"([^\"]+)\"/);       ($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($start =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);       $start=3600*$hour+60*$min+$sec+0.001*$milsec;              # result stop time       ($stop)=($result =~ m/stop=\"([^\"]+)\"/);       ($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($stop =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);       $stop=3600*$hour+60*$min+$sec+0.001*$milsec;       # compare to each known item       foreach  $i  (0...$#ki){         #         #   Below is the matching function: compare sources check differences between start (and stop) times         #         if(uc($ki[$i]{src}) eq uc($src) && ($start >= ($ki[$i]{start}-$deltaT)) && ($start < ($ki[$i]{stop}))){           # MATCH           $numfound++;           # Add precision to total score           $score+=$numfound/$rank;            # make sure we don't find the same known item twice           $ki[$i]{src}="thisitemisalreadyfound";           print;           #   break;         }       }     }   }   elsif (/\<\/VideoSearchResult\>/){     # end of topic      if (@ki !=() ){        # IF there are know items print average precision for this topic        # avg prec. is mean of precisions  after each known item is retrieved (using 0 as prec. for K-I not retrieved        print "<!-- average precision = ",$score/($#ki+1)," -->\n";                #add average prec. to sum over all topics:        $sumavgprec+=$score/($#ki+1);        # ...and increase number of topics for which Known Items exist        $numKItopic++;      }      print;   }   else{     print;   }   }  print "<!-- MEAN AVG. PREC. over $numKItopic Known-Item topics:", $sumavgprec/$numKItopic," -->\n";

--
Ramazan Taban
--
Information Access Division - ITL
National Institute of Standards and Technology (NIST)
US Department of Commerce
Bldg. 225  Mailstop 8940 - Gaithersburg, MD  20899-8940   USA
Phone : +(301) 975-4529
 



Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov