Posts by Collection

portfolio

publications

IS-CAM: Integrated Score-CAM for axiomatic-based explanations

Published in Responsible Computer Vision workshop, CVPR, 2021

Rakshit Naidu, Ankita Ghosh, Yash Maurya, Shamanth R Nayak K, Soumya Snigdha Kundu

Convolutional Neural Networks have been known as black-box models as humans cannot interpret their inner functionalities. With an attempt to make CNNs more interpretable and trustworthy, we propose IS-CAM (Integrated Score-CAM), where we introduce the integration operation within the Score-CAM pipeline to achieve visually sharper attribution maps quantitatively. Our method is evaluated on 2000 randomly selected images from the ILSVRC 2012 Validation dataset, which proves the versatility of IS-CAM to account for different models and methods.

Download here

XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches

Published in Machine Learning for Creativity and Design workshop, NeurIPS, 2021

Harsh Rathod, Manisimha Varma, Parna Chowdhury, Sameer Saxena, V Manushree, Ankita Ghosh, Sahil Khose

Sketches are a medium to convey a visual scene from an individual’s creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations.

Download here

Semi-Supervised Classification and Segmentation on High Resolution Aerial Images

Published in Tackling Climate Change with Machine Learning workshop, NeurIPS, 2021

Ankita Ghosh, Sahil Khose, Abhiraj Tiwari

FloodNet is a high-resolution image dataset acquired by a small UAV platform, DJI Mavic Pro quadcopters, after Hurricane Harvey. The dataset presents a unique challenge of advancing the damage assessment process for post-disaster scenarios using unlabeled and limited labeled dataset. We propose a solution to address their classification and semantic segmentation challenge. We approach this problem by generating pseudo labels for both classification and segmentation during training and slowly incrementing the amount by which the pseudo label loss affects the final loss. Using this semi-supervised method of training helped us improve our baseline supervised loss by a huge margin for classification, allowing the model to generalize and perform better on the validation and test splits of the dataset. In this paper, we compare and contrast the various methods and models for image classification and semantic segmentation on the FloodNet dataset.

Download here

talks

#1 A Deep Multi-Modal Explanation Model for Zero-Shot Learning (XZSL)

Published:

Zero-shot learning (ZSL) has attracted significant attention due to its capabilities of classifying new images from unseen classes. In this paper, we propose to address a new and challenging task, namely explainable zero-shot learning (XZSL), which aims to generate visual and textual explanations to support the classification decision. Link for the video

#3 XCiT: Cross-Covariance Image Transformers (Facebook AI)

Published:

After dominating Natural Language Processing, Transformers have taken over Computer Vision recently with the advent of Vision Transformers. However, the attention mechanism’s quadratic complexity in the number of tokens means that Transformers do not scale well to high-resolution images. XCiT is a new Transformer architecture, containing XCA, a transposed version of attention, reducing the complexity from quadratic to linear, and at least on image data, it appears to perform on par with other models. What does this mean for the field? Is this even a transformer? What really matters in deep learning? Link for the video

#5 CPC: Data-Efficient Image Recognition with Contrastive Predictive Coding

Published:

Deep Mind’s breakthrough paper: ‘Contrastive Predictive Coding 2.0’ (CPC 2) CPC 2 not only crushes AlexNet ‘s scores of 59.3% and 81.8% Top-1 and Top-5 accuracies with just 2% of the ImageNet data (60.4% and 83.9%) but given just 1% of the ImageNet data it achieves 78.3% Top-5 acc outperforming supervised classifier trained on 5x more data! Continuing with training on all the available images (100%) it not just outperforms fully supervised systems by 3.2% (Top-1 acc) but it still manages to outperform these supervised models with just 50% of the ImageNet data! Link for the video