Face size




hi, there,

I saw in the website that the face required to be detected for the
evaluation  is at least a quarter of frame width and height.  This is a
really big face. I browse some video clips and did not  find one clip
contains such a big face. maybe we can change it to something like 30x30
pixels ?   

- Huiping



-----Original Message-----
From: Ching-yung Lin [mailto:chingyung@us.ibm.com]
Sent: Wednesday, May 29, 2002 12:57 PM
To: Multiple recipients of list
Subject: Re: Video TREC mailing list



Hi,

   We met a similar problem. I am working on visual feature analysis, which
needs to consider the relationship between frames. I saw there are many
short shots (< 4 frames), mostly are transitions or oversegmentations
caused by noise. Those short shots may cause problem to us. Therefore, I
think eliminating short shots (<0.5 second) is a good idea.

Regards,
Ching-Yung

=============================================
Ching-Yung Lin, Ph.D.      Email: cylin@watson.ibm.com
Research Staff Member
IBM T. J. Watson Research Center
19 Skyline Dr., Hawthorne, NY 10532
Phone: (914)784-7822;  Fax: (914)784-7455



 

                      Giridharan

                      Iyengar/Watson/IB        To:       Multiple recipients
of list <trecvid@nist.gov>                          
                      M@IBMUS                  cc:

                      Sent by:                 Subject:  Re: Video TREC
mailing list                                             
                      trecvid@nist.gov

 

 

                      05/29/2002 12:36

                      PM

                      Please respond to

                      trecvid

 

 




Paul:

I agree with Roman. This is very much in line with the approach that we are
taking with respect to the audio features. I would like to add that a
monologue should also have such a minimum duration and that 15 video frames
is too small for that. I would suggest that for monologues, the minimum
duration should be 2 seconds (60 frames).

-giri
Giridharan Iyengar
Research Staff Member
Human Language Technologies
IBM TJ Watson Research Center
Yorktown Heights, NY 10598
914 945 2474

(Embedded image moved to file: pic19074.gif)Paul Over <over@nist.gov>

                                                                           
 (Embedd (Embedded image moved to file:       (Embedded image moved to     
 ed      pic07372.gif)                        file: pic05664.gif)          
 image                           Paul Over                                 
 moved                           <over@nist.g To: Multiple recipients of   
 to                              ov>          list <trecvid@nist.gov>      
 file:                           Sent by:     cc:                          
 pic0037                         trecvid@nist Subject: Video TREC mailing  
 6.gif)                          .gov         list                         
                                                                           
                                                                           
                                 05/29/2002                                
                                 12:04 PM                                  
                                 Please                                    
                                 respond to                                
                                 trecvid                                   
                                                                           




Posting for the group at DCU.....

I am currently working on audio feature extraction for video
TREC, and encoutered an issue which is potentially relevant
to others. The common shot boundary files provide a
description of the duration of each shot, which can be as
short as 1 frame. This is roughly 33 ms in time.

I am working on speech and music (instrumental sound)
extraction. To distinguish between music and speech, humans
require several hundred miliseconds of sound. However, the
current shot boundaries are often far shorter than that,
which creates a problem in these applications.

My proposal to solve this problem is to suggest a minimum
shot duration of 15 frames (i.e. 0.5 sec. in time), below
which the shots are not considered to be relevant for audio
feature extraction. In this way the common shot boundary
numbering is preserved, but audio features will not be
extracted from shots with less than 15 frames, as they are
irrelevant from audio perspective.

Roman Jarina
jarinar@eeng.dcu.ie
Centre for Digital Video Processing
Dublin City University

-------------------
PS: I wonder if there should be a minimum shot duration
for the other features too?

- Paul








Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov