CSE Colloquium: High-Performance and Cost-Effective Storage Systems for Supporting Big Data

ABSTRACT: With the widely usage of social media, e-business, smartphones, and smart home kits, data is generated everywhere, which constructs a new world. The world is all about data, we are in the big data era. Storage systems act as the keystone to ensure data persistency in today’s big data infrastructure. Due to the explosion of data scale, achieving a better tradeoff between performance and cost-effectiveness is one of the main challenges in designing and optimizing storage systems. 

In this talk, I will present my research that addresses the tradeoff challenges in primary storage, backup systems, and storage systems for AI/ML services. I will primarily discuss the widely used key-value store RocksDB workload characterization, modeling, and benchmarking in primary storage and AI/ML systems. The key findings motivate a follow up study for key-value store HBase performance improvement. It explores a way of applying in-storage computing based architecture to effectively reduce the network traffic in a compute-storage disaggregated infrastructure. I will also cover the research of data deduplication read performance improvements in backup systems for big data applications, which will include a hybrid look-ahead caching scheme and a sliding look-back window assisted data rewrite scheme. Finally, the vision of my future research on storage systems for new memory/storage devices, AI/ML systems, and new data infrastructure for wireless data center is introduced. 

BIOGRAPHY: Dr. Zhichao Cao is a research scientist at Facebook, mainly working on data infrastructure, storage systems, and databases. He finished his bachelor’s degree in Automation from Tsinghua University in China and completed his Ph.D. degree in computer science from the University of Minnesota in 2020, supervised by Prof. David H.C. Du. Zhichao’s research is on designing and optimizing high-performance and cost-effective storage systems for big data. Specifically, he works on tiered file systems, key-value stores, secondary storage systems, systems for new storage devices, and storage systems for AI/ML platforms. Zhichao has published more than 15 papers in major conferences and journals, including USENIX FAST, USENIX HotStorage, IEEE MASCOTS, Computer in Industry, IEEE Transaction on Computers, and ACM Transaction on Storage. He also has rich industry experience through multiple research internships and research collaborations at leading companies such as NetApp, HPE, Veritas, and Facebook. You can find out more about Zhichao at https://www-users.cs.umn.edu/~caoxx380/. 

 

Share this event

facebook linked in twitter email

Event Contact: Jack Sampson

 
 

About

The School of Electrical Engineering and Computer Science was created in the spring of 2015 to allow greater access to courses offered by both departments for undergraduate and graduate students in exciting collaborative research fields.

We offer B.S. degrees in electrical engineering, computer science, computer engineering and data science and graduate degrees (master's degrees and Ph.D.'s) in electrical engineering and computer science and engineering. EECS focuses on the convergence of technologies and disciplines to meet today’s industrial demands.

School of Electrical Engineering and Computer Science

The Pennsylvania State University

207 Electrical Engineering West

University Park, PA 16802

814-863-6740

Department of Computer Science and Engineering

814-865-9505

Department of Electrical Engineering

814-865-7667