CSE Colloquium: Stream Processing Systems for Emerging Trends
Zoom Information
Join from PC, Mac, Linux, iOS or Android: https://psu.zoom.us/j/99286975522?pwd=d2FQdUNEbEplRmppWkUrd2crQkN4QT09 Password: 066132
or iPhone one-tap (US Toll): +16468769923,99286975522# or +13017158592,99286975522#
or Telephone: Dial: +1 646 876 9923 (US Toll) +1 301 715 8592 (US Toll) +1 312 626 6799 (US Toll) +1 669 900 6833 (US Toll) +1 253 215 8782 (US Toll) +1 346 248 7799 (US Toll) Meeting ID: 992 8697 5522 Password: 066132 International numbers available: https://psu.zoom.us/u/auXVyVxZh
ABSTRACT: Stream processing is proposed and popularized as a “technology like Hadoop but can give you results faster”, which lets users query a continuous data stream and quickly get results within a very short time period from the time of receiving the data. For that reason, stream processing technology has become a critical building block of many applications, such as making business decisions from marketing streams, identifying spam campaigns from social network streams, predicting tornados and storms from radar streams, and analyzing genomes in different labs and countries to track the sources of a potential epidemic. However, state-of-art solutions have dominantly centered around stateless stream processing, leaving another urgent trend—stateful stream processing—much less explored. A driving need is that the future stream applications need to store and update state along with their processing, and process live data streams in a timely fashion from massive and geo-distributed data sets. Unfortunately, existing systems are mainly designed for low-latency intra-datacenter settings. They do not scale well for running stream applications that contain large-distributed states in geo-distributed datacenters, suffering a significantly centralized bottleneck and high latency.
In this talk, I will present a next-generation geo-distributed scalable stateful stream processing system. (1) At the architecture layer, I will introduce a decentralized “many masters/many workers” architecture that revolutionary improves the scalability of stream processing systems.
(2) At the operator layer, I will present an in-memory data structure for storing state that minimizes the memory overhead. (3) At the mechanism layer, I will present a fragment-based parallel recovery mechanism that recovers large-distributed states by leveraging distributed hash table (DHT) based overlay networks and erasure codes. (4) Finally, I will outline future research agenda on developing scalable stream processing systems for emerging trends.
BIOGRAPHY: Dr. Liting Hu is an Assistant Professor of Computer Science in the School of Computing and Information Sciences at Florida International University (FIU). She received her Ph.D. in Computer Science from Georgia Institute of Technology in 2016 under the supervision of Dr. Karsten Schwan. Her research interests span distributed systems, cloud and edge computing, distributed systems, and system virtualization, with a focus on building scalable stream processing systems. She directs the Experimental and Virtualized Systems (ELVES) Research Lab, where she conducts experimental computer systems research. Examples include stream processing systems (with Spark Streaming, Storm, Flink), container as a service (with Docker and Kubernetes), identifying threats (e.g., fake news, rumors, social bots) in online social networks, and resource management in large-scale data centers (with Xen and KVM). She has served on numerous IEEE/ACM program committees and peer-reviewed more than a dozen journals. She interned at VMware, IBM Research, Microsoft Research Asian, and Intel labs at CMU. Her research has been funded by the NSF, Department of Homeland Security, and Cyber Florida. She was the recipient of an NSF SPX Award in 2019 and an NSF CAREER Award in 2020.
Event Contact: Jack Sampson