Deep learning based fusion of acoustic and linguistic features for spoken document retrieval