Office of the CTO, Cray Inc.
Sr. Data Analytics Architect
Bio: Rangan Sukumar is a Senior Analytics Architect in the CTO’s office at Cray Inc. His role is three-fold: (i) Solutions architect – Creating bleeding-edge solutions for scientific and enterprise problems in the long-tail of the Big Data market requiring scale and performance beyond what cloud computing offers, (ii) Technology visionary – Designing the roadmap for analytic products through evaluation of customer requirements and aligning them with emerging hardware and software technologies, (iii) Analytics evangelist – Demonstrating what Big Data and HPC can do for data-centric organizations. Before his role at Cray, he served as a group leader, data scientist and artificial intelligence/machine learning researcher scaling algorithms on unique super-computing infrastructures at the Oak Ridge National Laboratory. He has over 70 publications in areas of disparate data collection, organization, processing, integration, fusion, analysis and inference – applied to a wide variety of domains such as healthcare, social network analysis, electric grid modernization and public policy informatics.
Title: Medical Discoveries when Big Data, AI & HPC Converge
Abstract: This talk is about the convergence of high performance computing (HPC) technologies for Big Data problems and artificial intelligence workflows. The convergence achieved with the combination of the HPC interconnect, the application of HPC best practices and communication collectives: (i) enables the ability to process 1000x bigger graph datasets up to 100x faster than competing tools on commodity hardware (i.e. GraphX) (ii) provides a 2-26x speed-up on matrix factorization workloads compared to cloud-friendly Apache Spark (iii) promises over 90% scaling efficiency on deep learning workloads (i.e. potential reduction in training time from days to hours). These benchmark results when assembled into data science workflows enable creative applications for discovery of domain-specific insights.
The talk will delve deeper into a use-case of applying artificial intelligence on medical `Big Data’ represented as massive, ad-hoc, heterogeneous graph networks. We will present the Cray Graph Engine (CGE) as a demonstration of the convergence of HPC and AI for Big Data that is capable of : (i) speeding-up ad-hoc searches (e.g. a query-able semantic database) and graph-theoretic mining (e.g. graph-theoretic algorithms) (ii) scaling to massive data sizes and (iii) providing newer functionality for temporal, streaming and snapshot analysis of massive graphs. We will demonstrate the convergence of graph-theory on a semantic database extracted from PubMed containing over 90 million knowledge nuggets published in over 27 million publications in medical literature and show how this capability was used as a: (i) demonstration of “explainable” artificial intelligence that augments clinical/medical researchers at the Historical Clinico-pathological Conference in Baltimore, USA to solve mystery illnesses; (ii) hypothesis generation tool that discovered the relationship between beta-blocker treatment and diabetic retinopathy at The University of Tennessee Health Sciences Center, Memphis, USA; (iii) knowledge browser that revealed xylene as an environmental cancer-causing carcinogen at the Oak Ridge National Lab, USA.