Courses and Projects

Spring 2024
Winter 2023-2024
The Project in Storage Systems (236388) has a prerequisite of Operating Systems, and can be taken individually or in pairs
Optimizing Data Deduplication
Data deduplication is one for the most effective ways to reduce the size of data stored in large scale systems, and is widely used to date. The process of deduplication consists of identifying duplicate chunks of data in different files, storing a single copy of each unique chunk, and replacing the duplicate chunks with pointers to this copy.

Example projects: evaluating new techniques for data chunking and duplicate identification, evaluating the tradeoff between space utilization and performance, Analyzing duplication characteristics of production storage snapshots, and investigating algorithms for managing volume capacities.

Visualizing Flash Performance with SSDPlayer
SSDPlayer is a visualization tool which enables system designers to view how the layout of data on the flash media progresses over time according to the implemented management policy. This visualization allows them to understand complex data-movement processes and to optimize their system in light of this understanding.

Projects related to SSDPlayer focus enhancing the tool with additional capabilities, so that more complex policies and behaviors can be visualized and analyzed. 

Behind The Scenes of Flash based SSD Performance
 Designing and optimizing Flash-based SSDs is challenging because of the special characteristics of flash media and the high complexity of the firmware that manages it at the device level. New optimizations and techniques should address both low-level hardware access and high-level application types and objectives.

Example projects: implementing alternative management policies on an SSD simulator, experimental SSD prototype, or flash chips, evaluating new and existing policies on various workload and data types, comparing performance and durability of flash chips of different technologies.   

Analysis of Production Storage Workloads
I/O workload analysis identifies important workload characteristics, which are then used for storage allocation and provisioning, workload consolidation, cache management, and application specific optimizations. As SSDs are becoming a dominant building block in many storage systems, I/O workload analysis should be redesigned with SSDs in mind, to examine new related metrics.

Example projects: processing and analyzing I/O traces from production and research servers, validating known results on new workloads, investigating techniques for online characterization, and evaluating new optimizations based on workload characteristics. 

Additional exploratory projects
I am continuously exploring new research directions in topics such as new memory technologies and complex hierarchies, erasure coding in distributed systems, edge computing and IoT, specialized data structures, and more. Additional projects may be available and not yet updated on this page, so contact me for details. 
Past Courses