HiPC 2017 hosted the following two IRUS sessions to be held on Days 2 (Tuesday, December 19) of the conference.
Session 1: Practical Applications of Machine Learning in the e-commerce sector
When : Tuesday, December 19, 2017 – 10:00 AM – 12:00 PM
Presentation # 1 : Anomaly Detection using Tensor Decomposition
We look at detecting anomalies in the context of e-commerce: detect sellers who may have solicited fake reviews from customers to artificially increase the rating of their product(s). Since most of the connections between the seller’s product(s) and the customers who buy their products are inherently random and spread across time; the problem boils down to detecting non-random (or anomalous) connections. This manifests as detecting dense bi-partite cores between the sellers & customers that are formed in a relatively short period of time. We apply a scalable tensor decomposition technique to detect such dense bi-partite cores.
Anil R. Yelundur is working as an Applied Scientist with Amazon for over 1.5 years. He has over ten years of experience in the field of Machine Learning and has worked on problems ranging from applying topic modeling, anomaly detection in high-dimensional data and pattern recognition. His current focus includes applying Bayesian techniques and natural gradient learning to anomaly detection.
Presentation # 2 : Personalization via Persona-ization for Merchandising Lifestyle Articles
Personalised merchandising of lifestyle articles is of immediate interest to the e-commerce industry, and is beginning to attract the attention of the research community. Commonly adopted strategies, such as recommending popular items and collaborative ﬁltering, are inadequate for this vertical owing to several reasons. Firstly, users have their own personal preferences over items – referred to as styles – leading to the long-tail phenomenon. Secondly, each user displays multitude of personas, with each persona demonstrating a unique preference over items dictated by the shopping need – e.g. shopping fashion accessories for leisure and formal wear for business. Merchandising in this vertical is crucially dependent on discovering styles for each of the multitude of personas. We posit a generative model which represents each user by a simplex over personas, where each persona is described as a preference over prevailing styles, which, in turn, are modelled as distributions over items themselves. The choice of simplex and the long-tail nature necessitates the use of stick-breaking process, for which we develop an eﬃcient collapsed Gibbs sampler. Trained on large-scale behavioural logs spanning more than half-a-million sessions collected from an e-commerce portal, the proposed algorithm outperforms previous baselines by a large margin of 35% in identifying personas. Consequently, it outperforms several competitive baselines comprehensively on the task of recommending from a catalogue of roughly 150 thousand lifestyle articles, by improving the recommendation quality as measured with auROC by a staggering 12.23%, in addition to aiding the interpretability of the uncovered individual and popular styles, thus advancing our precise understanding of the underlying phenomenon.
Samik Datta earned his Master’s degree in Computer Science & Engineering from the Indian Institute of Science in 2008, where he specialised in applied machine learning under the tutelage of Professor Chiranjib Bhattacharyya. Subsequently, during the 2008−2014 period, Samik worked at Bell Laboratories, India, as a Member of Technical Staﬀ, where he focused on applied machine learning and applied algorithms research in the context of social, communication and transportation networks. Since 2014, Samik is with Flipkart, India, where he presently serves as a Principal Data Scientist and is responsible for personalising the e-commerce experience by constructing rich customer proﬁles, as well as personalising key customer-facing constructs. Samik has additionally worked on a diverse array of e-commerce-inspired problems – ranging from weakly-supervised topic and sentiment classiﬁcation for Twitter, to user generated content-backed product comparison – while at Flipkart, resulting in 4 refereed publications in prestigious venues.
Presentation # 3 : Answering Questions on Products
Amazon product pages contain a wealth of information from various sources such as, product title, specifications, description, reviews, and community question answers. The rising amount of content on the detail pages has made the discoverability of relevant product information a challenge for our customers. In addition to content growth, customers face an increased complexity in the product evaluation due to a growing feature density on detail pages. The problem is even more glaring on small form factor devices like mobile. Motivated by this, we set forth on a goal to improve access to product information of interest to our customers. In this talk, we present our deep learning-based question-answering bot that answers with the most relevant content to product feature related customer questions on our product detail pages. We will discuss our deep learning architecture and demonstrate its efficacy through both quantitative and qualitative evaluation.
Ashish Kulkarni is a Machine Learning Scientist in the India Machine Learning team at Amazon. He has 5+ years of experience in the field of AI and machine learning and over a decade of industry experience. He is a PhD candidate at IIT Bombay and has published in several international conferences including, IJCAI, AAAI, PAKDD and others.
Presentation # 4 : Estimating the performance of online A/B tests on historical logs
A/B tests are popular in industry because of their simplicity and ability to compare policies on real traffic. Conducting online A/B tests is expensive in time and customer experience domain, which brings the need for alternatives. Offline policy evaluation is a technique of estimating value of an individual policy by simulating it on historical logs. In this talk, we will discuss how offline policy evaluation techniques can help predict the outcome of A/B tests without running them in production. Specifically, we will look into the challenges faced while applying Inverse Propensity Score estimator and compare its performance with other policy evaluation techniques on real datasets within Amazon.
Saurabh is a Machine Learning Scientist in Amazon. He obtained his Master’s degree from IIT Bombay, with a major in NLP and ML. Over the last seven years, Saurabh has worked on recommendation systems, supervised learning, text mining and recently policy evaluation and bandits. Prior to Amazon Saurabh was associated with America Online, applying his knowledge to spam filters in AOL Mail.
Session 2: Performance challenge in Deep Learning and Scientific Computing
When : Tuesday, December 19, 2017 – 3:15 PM – 5:15 PM
Presentation # 1 : Deep Learning using Xilinx FPGAs
Deep learning models and their combinations are now being used to solve a variety of problems. Image classification, speech recognition, and language translation are just a few of the areas of deep learning algorithm application. In this presentation, I will focus on the implementation of some of the common deep learning models for inference using Xilinx FPGAs. The presentation will focus on efficient implementation using lower precision arithmetic, customized memory hierarchy, and model pruning to address broader figures of merits, namely speed, latency, energy, and accuracy. I will give examples of high performance, programmable variable precision configurable CNN and RNN overlays which have been deployed in real applications on cloud and edge applications.
Ashish Sirasao is a Distinguished Engineer in the Xilinx Software and IP team in San Jose, California. His team is currently involved in defining and implementing methodologies and hardware architectures for high-performance accelerators in the area of Deep Learning, Data Analytics, Computer Vision, and Video Codecs on Xilinx FPGAs. Ashish has a Master’s degree in EE from Indian Institute of Technology, Powai, Mumbai, India. He has a broad experience in developing design tools, applications, and methodologies to harness the flexibility and reconfigurability of FPGAs.
Presentation # 2 : Computing Challenges in HEP for WLHC grid
As CERN moves towards preparation for increasing the luminosity of the particle beam towards HL-LHC, predictions shows computing demand would out grow our conservative scaling estimates by over ten times. Fortunately we are talking about a time scale of roughly ten years to develop new techniques and novel solutions to address this gap in compute resources. Experiments at CERN face a unique scenario where in they need to scale both latency sensitive workloads such as data acquisition of the detectors and throughput based ones such as simulations and reconstruction of high level events and physics processes. In this talk we cover some of the ongoing research at tier-0 in CERN which investigates several aspects of throughput sensitive workloads that consume significant compute cycles.
Dr. Servesh Muralidharan is a computer scientist working as a senior fellow at CERN IT department. He works on investigating optimisations suitable for HEP computations performed by the various experiments at CERN. His research focuses predominantly on performance characterisation of features of modern many core architectures and how they could help improve overall compute efficiency. Dr. Muralidharan completed his PhD in computer science from Trinity College Dublin in 2015 followed by a Postdoc at ICHEC, Ireland’s national supercomputing centre before starting his fellowship at CERN.