RnD

Speech recognition

First time to know speech recognition in 1993 when I pursued Master degree in BUPT, and my thesis topic is isolated digit words recognition. Then in 1998, to continue my research interest, I joined speech group, NLPR 模式识别国家重点实验室.

It was 4-year memorable experience, meeting a lot of great supervisors, teachers, and friends, and having opportunities to attend research & industry speech recognition projects, in charging of building lab real-time LVCSR and commercializing speech recognition. From today’s view, it is too early to really solving speech recognition problem.

Thesis: large vocabulary continuous Chinese speech recognition

  • Acoustic model
  • N-best, One-pass, A* search, decision tree, language model, real-time implementation

Audio / music search lets you find your favorite music & songs from millions of music database using mobile phone with less than 3s at anytime, anywhere. Audio search extracts small fingerprint from audio clip and index large scale fingerprint using inverted index, with high accuracy >97% (millions of songs) and low latency. Very efficient implementation. Want to know more,contact me.

MFoM: metric oriented learning algorithm optimization

MFoM learning algorithm (SIGIR’03, ICML’04, ACMTOIS’06) is a general framework to learning ML model through directly optimizing expected performance metric in practical application scenario. It is extended from MCE (minimal classification error, Juang, B.-H., Chou, W. and Lee, C.-H 1997) used in training HMM to minimize classification error rate rather than log-likelihood. Please refer to our papers for more details, and contact me about implementation code.

Music summary

Music summary is to extract most informative clip from the whole song or music recording. It is thus to do music structure analysis and to identify import pattern as summary. In the project, a audio pattern mining algorithm is developed to find repeated segment from song and music, which segment is extracted to summarize song. This is motivated by human capability identifying repeated melody from audio, and this repeated melody mostly representing the song.

Image annotation
two-modality based image retrieval

In about 2006, when auto image annotation is becoming more and more interest in academic, I started working on the research direction to develop algorithms on auto image tag. Image tag is multiple label problem, which is similar to the problem when we develop MC MFoM learning algorithm.

In that period, there is very few public image data set with labels available, only Corel data set with 5000 images.

Leave a comment