CSE Colloquium: High-Performance and Cost-Effective Storage Systems for Supporting Big Data
ABSTRACT: With the widely usage of social media, e-business, smartphones, and smart home kits, data is generated everywhere, which constructs a new world. The world is all about data, we are in the big data era. Storage systems act as the keystone to ensure data persistency in today’s big data infrastructure. Due to the explosion of data scale, achieving a better tradeoff between performance and cost-effectiveness is one of the main challenges in designing and optimizing storage systems.
In this talk, I will present my research that addresses the tradeoff challenges in primary storage, backup systems, and storage systems for AI/ML services. I will primarily discuss the widely used key-value store RocksDB workload characterization, modeling, and benchmarking in primary storage and AI/ML systems. The key findings motivate a follow up study for key-value store HBase performance improvement. It explores a way of applying in-storage computing based architecture to effectively reduce the network traffic in a compute-storage disaggregated infrastructure. I will also cover the research of data deduplication read performance improvements in backup systems for big data applications, which will include a hybrid look-ahead caching scheme and a sliding look-back window assisted data rewrite scheme. Finally, the vision of my future research on storage systems for new memory/storage devices, AI/ML systems, and new data infrastructure for wireless data center is introduced.
BIOGRAPHY: Dr. Zhichao Cao is a research scientist at Facebook, mainly working on data infrastructure, storage systems, and databases. He finished his bachelor’s degree in Automation from Tsinghua University in China and completed his Ph.D. degree in computer science from the University of Minnesota in 2020, supervised by Prof. David H.C. Du. Zhichao’s research is on designing and optimizing high-performance and cost-effective storage systems for big data. Specifically, he works on tiered file systems, key-value stores, secondary storage systems, systems for new storage devices, and storage systems for AI/ML platforms. Zhichao has published more than 15 papers in major conferences and journals, including USENIX FAST, USENIX HotStorage, IEEE MASCOTS, Computer in Industry, IEEE Transaction on Computers, and ACM Transaction on Storage. He also has rich industry experience through multiple research internships and research collaborations at leading companies such as NetApp, HPE, Veritas, and Facebook. You can find out more about Zhichao at https://www-users.cs.umn.edu/~caoxx380/.
Event Contact: Jack Sampson