Research Profile


I am currently working as a Professor at School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, Pakistan. Besides, I am an Adjunct Senior Lecturer at the School of Computer Science and Software Engineering at The University of Western Australia in Perth, Australia. Previously, I was a Senior Researcher at the German Research Center for Artificial Intelligence (DFKI) as well as an Adjunct Lecturer at Kaiserslautern University of Technology (TUKL), Germany. I received my PhD with the highest distinction in computer engineering from TUKL in 2008. My research interests include machine learning and pattern recognition with a special emphasis on applications in document image analysis. I have co-authored over 100 publications in international peer-reviewed conferences and journals in this area. My Google Scholar Profile shows over 5,000 citations of my papers.

Professional Activities

Recent Publications

Viewpoint invariant semantic object and scene categorization with RGB-D sensors

Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin.

DeepParse: Trainable Postal Address Parser

Postal applications are among the first beneficiaries of the advancements in document image processing techniques due to their economic significance. To automate the process of postal services it is necessary to integrate contributions from a majority of image processing domains, from image acquisition and preprocessing to interpretation through symbol, character and word recognition. Lately, machine learning approaches are deployed for postal address processing. Parsing problem has been explored using different techniques, like regular expressions, CRFs, HMMs, Decision Trees and SVMs. These traditional techniques are designed on the assumption that the data is free from OCR errors which decreases the adaptability of the architecture in real-world scenarios. Furthermore, their performance is affected in the presence of non-standardized addresses resulting in intermixing of similar classes. In this paper, we present the first trainable neural network based robust architecture- DeepParsefor postal address parsing that tackles these issues and can be applied to any Name Entity Recognition (NER) problem. The architecture takes the input at different granularity levels: characters, trigram characters and words to extract and learn the features and classify the addresses. The model was trained on a synthetically generated dataset and tested on real-world addresses. DeepParse has also been tested on the NER dataset i.e. CoNLL2003 and gave the result of 90.44% which is on par with the state-of-art technique

Table Detection in Document Images using Foreground and Background Features

Table detection is an important step in many document analysis systems. It is a difficult problem due to variety of table layouts, encoding techniques and the similarity of tabular regions with non-tabular document elements. Earlier approaches of table detection are based on heuristic rules or require additional PDF metadata. Recently proposed methods based on machine learning have shown good results. This paper, based on foreground and background features, describes performance improvement to these table detection techniques. Proposed solution is based on the observation that tables tend to contain more numeric data and hence it applies color coding/coloration as a signal for telling apart numeric and textual data. Deep learning based Faster R-CNN is used for detection of tabular regions from document images. To gauge the performance of our proposed solution, publically available UNLV dataset is used. Performance measures indicate improvement when compared with best in-class strategies

Automated Military Vehicle Detection From Low-Altitude Aerial Images

Detecting military vehicles and distinguishing them out from non-military vehicles is a significant challenge in the defence sector. Detection of military vehicle could help to identify enemy’s move and hence, build early precautionary measures. Recently, many deep learning based techniques have been proposed for vehicle detection purpose. However, they are developed using datasets that are not useful if military specific vehicle training and detection is required. Hyper-parameters in those techniques are not tuned to entertain low-altitude aerial imagery. We aim to develop state-of-the-art deep learning framework to detect particularly military vehicle along with other standard non-military vehicles. The major bottleneck in the application of deep learning frameworks to detect military vehicles is the lack of available datasets. In this context, we prepared a dataset of low-altitude aerial images that comprises of real data (taken from military shows videos) and toy data (taken from YouTube videos). Our dataset is categorized into three main types i.e. military vehicle, non-military vehicle and non-vehicle. We employed state-of-the-art object detection algorithms to distinguish military and non-military vehicles. Specifically, the three deep architectures used for this purpose include faster regionbased convolutional neural networks (Faster RCNN), recurrent fully convolutional neural networks (R-FCN), and single shot multibox detector (SSD).We observed the impact on results by increasing training data using SSD architecture. We also did comparative analysis of three state-of-the-art architectures by increasing training data and observing it’s impact on results. The experimental results show that the training of deep architectures using the customized/prepared dataset allows to recognize seven types of military and four types of non-military vehicles. It can handle complex scenarios by differentiating vehicle from it’s surroundings objects. We report the mean average precision (MAP) and weighted average precision (WAP) obtained using the three adopted architectures with Faster R-CNN giving the highest WAT of around 62.79 % for military vehicle category with 380,530 iterations (3 epochs).

3DAirSig: A Framework for Enabling In-Air Signatures Using a Multi-Modal Depth Sensor

In-air signature is a new modality which is essential for user authentication and access control in noncontact mode and has been actively studied in recent years. However, it has been treated as a conventional online signature, which is essentially a 2D spatial representation. Notably, this modality bears a lot more potential due to an important hidden depth feature. Existing methods for in-air signature verification neither capture this unique depth feature explicitly nor fully explore its potential in verification. Moreover, these methods are based on heuristic approaches for fingertip or hand palm center detection, which are not feasible in practice. Inspired by the great progress in deep-learning-based hand pose estimation, we propose a real-time in-air signature acquisition method which estimates hand joint positions in 3D using a single depth image. The predicted 3D position of fingertip is recorded for each frame. We present four different implementations of a verification module, which are based on the extracted depth and spatial features. An ablation study was performed to explore the impact of the depth feature in particular. For matching, we employed the most commonly used multidimensional dynamic time warping (MD-DTW) algorithm. We created a new database which contains 600 signatures recorded from 15 different subjects. Extensive evaluations were performed on our database. Our method, called 3DAirSig, achieved an equal error rate (EER) of 0.46%. Experiments showed that depth itself is an important feature, which is sufficient for in-air signature verification. The dataset will be publicly available (

Representation learning with deep extreme learning machines for efficient image set classification

Efficient and accurate representation of a collection of images, that belong to the same class, is a major research challenge for practical image set classification. Existing methods either make prior assumptions about the data structure, or perform heavy computations to learn structure from the data itself. In this paper, we propose an efficient image set representation that does not make any prior assumptions about the structure of the underlying data. We learn the nonlinear structure of image sets with deep extreme learning machines that are very efficient and generalize well even on a limited number of training samples. Extensive experiments on a broad range of public datasets for image set classification show that the proposed algorithm consistently outperforms state-of-the-art image set classification methods both in terms of speed and accuracy

A Multi-faceted OCR Framework for Artificial Urdu News Ticker Text Recognition

Content based information search and retrieval has allowed for easier access to data. While Latin based scripts have gained attention and support from academia and industry, there is limited support for cursive script languages, like Urdu. In this paper, we present the first instance of Urdu news ticker detection and recognition and take a micron sized step towards the goal of super intelligence. The presented solution allows for automating the transcription, indexing and captioning of Urdu news video content. We present the first comprehensive data set, to our knowledge, for Urdu news ticker recognition, collected from 41 different news channels. The data set covers both high and low quality channels, distorted and blurred news tickers, making the data set an ideal test case for any automatic Urdu News Recognition system in future. We identify and address the key challenges in Urdu News Ticker text recognition. We further propose an adjustment to the ground-truth labeling strategy focused on improving the readability of recognized output. Finally, we propose and present results from a Bi-Directional Long Short-Term Memory (BDLSTM) network architecture for news ticker text recognition. Our custom trained model outperforms Google’s commercial OCR engine in two of the four experiments conducted.

Automated Forgery Detection in Multispectral Document Images Using Fuzzy Clustering

Multispectral imaging allows for analysis of images in multiple spectral bands. Over the past three decades, airborne and satellite multispectral imaging have been the focus of extensive research in remote sensing. In the recent years, ground based multispectral imaging has gained an immense amount of interest in the fields ranging from computer vision and medical imaging to art, archaeology and computational forensics. The rich information content in multispectral images allows forensic experts to examine the chemical composition of forensic traces. Due to its rapid, non-contact and non-destructive characteristics, multispectral imaging is an effective tool for visualization, age estimation, detection and identification of forensic traces in document images. Ink mismatch is a key indicator of forgery in a document. Inks of different materials exhibit different spectral signature even if they have the same color. Multispectral analysis of questioned documents images allows identification and discrimination of visually similar inks. In this paper, an efficient automatic ink mismatch detection technique is proposed which uses Fuzzy C-Means Clustering to divide the spectral responses of ink pixels in handwritten notes into different clusters which relate to the unique inks used in the document. Sauvola’s local thresholding technique is employed to efficiently segment foreground text from the document image. Furthermore, feature selection is used to optimize the performance of the proposed method. The presented method provides better ink discrimination results than state-of-the-art methods

Journal Papers

2010 – Today
2000 – 2009

Conference Publications

2010 – Today
2000 – 2009

Getting in touch is easy!

Your questions and comments are important to us. Email us with any inquiries or call on the number given. We would be happy to answer your questions and set up a meeting with you. Our experts will likely help you in any sort of questions you may have.