Media and Text Mining

Basic information
Neptun code: 
Hosting department: 
BME-VIK Business Information Systems
MSc - Master of Science
Specialization subject
MSc basic obligatory subject (KVT)
Course coordinator: 
The course is concerned with introducing the students to the identification, assessment and analysis of the intelligent information search systems and multimedia retrieval systems. It also focuses on content handling techniques, where contents may either be text or media, or both.

Obtained skills and expertise:

  • Methods for media and text analysis, searching techniques
  • Developing media retrieval and search systems in enterprises


  • Mathematical handling of space and time, methods for position determination
  • Text analysis. Stemming algorithm - Porter, Lovins method. Language detection, language dependency.
  • Indexing, ranking procedures, PageRank, webgraph methods, Boole-search, tf-idf metrics, SVD, cosines distance. Reduction of dimensions, principal component analysis: PCA, ICA.
  • Hierarchical taxonomy systems, Catalogue search, thesaurus.
  • Image retrieval. Line detection, skeletonization. Image and time series in multimedia.
  • Media-indexing. Probability models in video and audio searches, Applications of Hidden Markov Models.


  • Blanken, de Vries, Blok, Fres (eds): Multimedia Retrieval. Springer, 2007.
  • Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval. Cambridge University Press, 2008
  • Ronen Feldman, James Sanger: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007