CSE 290S: Preserving digital information for future generations
- Class will be held in 380 Engineering 2, starting on Thursday April 6th.
- No class on Thursday, May 4th.
As many of you know, I am retiring from UCSC in June 2023, so this is the last class that I will teach as a regular faculty member. This course will cover a research area that I have worked on for nearly twenty years, and will describe the underpinnings of a critically important problem: how can we build systems that reliably and securely preserve information for future generations?
CSE 290S (Spring 2023) will cover the technologies and techniques necessary to preserve digital information for decades to centuries. We will first cover currently-available technologies that might be used for archival storage, such as disk, tape, and flash, followed by a discussion of potential new technologies such as glass media and DNA. We will then cover techniques that we must use to ensure that the system can preserve integrity and security of the bits themselves. We will conclude by touching on issues relating to understanding the stored bits.
Information
- Instructor: Professor Ethan L. Miller
- Quarter: Spring 2023
- When: Tuesday & Thursday 09:50–11:25 AM
- Where: 380 Engineering 2 (moved from Crown Classroom 105)
- Required readings: Papers (see below) will be available online
Requirements
Each student in the class will:
- Read the papers on the reading list, and be prepared to discuss them in class.
- Present several papers from the reading list. The exact number of papers will depend on the number of enrolled students.
- Complete a term-long project on a topic related to long-term data storage.
- Complete an open-book and open-note take-home final exam. The final exam will be submitted online, and will be due at the end of the final exam slot during exam week.
Schedule
The approximate week-by-week schedule is listed below. Papers will be added for each week, at least a week in advance. Note that the links to the papers may require access to various digital libraries. All papers will be freely downloadable from a campus-connected computer; if you're off-campus, you might need to use the UCSC VPN.
-
Introduction
- Mary Baker, Kimberly Keeton, Sean Martin,
Why Traditional Storage Systems Don’t Help Us Save Stuff Forever,
Proceedings of the First IEEE Workshop on Hot Topics in System Dependability, June 2005. - David S. H. Rosenthal, Thomas Robertson, Tom Lipkis, Vicky Reich, Seth Morabito,
Requirements for Digital Preservation Systems: A Bottom-Up Approach,
Technical Report arXiv:cs/0509018, September 2005. -
Mark W. Storer, Kevin Greenan, Ethan L. Miller,
Long-Term Threats to Secure Archives,
Proceedings of the 2nd ACM Workshop on Storage Security and Survivability (StorageSS 2006), October 2006.
- Mary Baker, Kimberly Keeton, Sean Martin,
- Existing storage technologies
- Yuhui Deng,
What is the future of disk drives, death or rebirth?,
ACM Computing Surveys 43(3), Article 23, April 2011. -
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John Davis, Mark Manasse, Rina Panigrahy,
Design tradeoffs for SSD performance,
Proceedings of the 2008 USENIX Technical Conference (USENIX'08), June 2008. - Kazuo Goda, Masaru Kitsuregawa,
The History of Storage Systems,
Proceedings of the IEEE 100, pages 1433–1440, April 2012. - OPTIONAL: Inside Solid State Drives (SSDs)
- OPTIONAL: Solid-State-Drives (SSDs) Modeling
- Yuhui Deng,
- New storage technologies
- Ian F. Adams, Ethan L. Miller, David S. H. Rosenthal,
Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes,
Technical Report UCSC-SSRC-11-07, October 2011. - Anderson, et al.,
Glass: A New Media for a New Era?,
Proceedings of HotStorage 2018, 2018. - Luis Ceze, Jeff Nivala, Karin Strauss,
Molecular digital data storage using DNA,
Nat Rev Genet 20, 456–466 (2019). https://doi.org/10.1038/s41576-019-0125-3 - Andromachi Chatzieleftheriou, Ioan Stefanovici, Dushyanth Narayanan,
Benn Thomsen, Antony Rowstron,
Could cloud storage be disrupted in the next decade?,
Proceedings of HotStorage 2020, 2020.
- Ian F. Adams, Ethan L. Miller, David S. H. Rosenthal,
- Erasure coding
- Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, Sergey Yekhanin,
Erasure Coding in Windows Azure Storage,
Proceedings of the 2012 USENIX Annual Technical Conference, 2012. - James S. Plank, Mario Blaum, James L. Hafner,
SD Codes: Erasure Codes Designed for How Storage Systems Really Fail,
Proceedings of FAST 2013, February 2013. - Yuchong Hu, Liangfeng Cheng, Qiaori Yao, Patrick P. C. Lee, Weichun Wang, Wei Chen,
Exploiting Combined Locality for Wide-Stripe Erasure Coding in Distributed Storage,
Proceedings of FAST 2021, February 2021. - OPTIONAL:
Myna Vajha, et al.,
Clay Codes: Moulding MDS Codes to Yield an MSR Code,
Proceedings of FAST 2018, February 2018.
- Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, Sergey Yekhanin,
- Building reliable storage systems
NOTE: no class on Thursday, May 4th- Andreas Haeberlen, Alan Mislove, Peter Druschel,
Glacier: Highly durable, decentralized storage despite massive correlated failures,
Proceedings of NSDI 2005, 2005. - Shobana Balakrishnan, et al.,
Pelican: A building Block for Exascale Cold Data Storage,
Proceedings of OSDI 2014, 2014. - REFERENCE: Microsoft's Project Pelican
- Andreas Haeberlen, Alan Mislove, Peter Druschel,
- Security and integrity
- Mahesh Kallahalla, Erik Riedel, Ram Swaminathan, Qian Wang, Kevin Fu, Plutus: Scalable secure file sharing on untrusted storage, Proceedings of FAST 2003, March 2003.
- Adi Shamir, How to Share a Secret, Communications of the ACM 22(11), November 1979, pages 612–613.
- Jason K. Resch, James S. Plank, AONT-RS: Blending Security and Performance in Dispersed Storage Systems, Proceedings of FAST 2011, February 2011.
- Petros Maniatis, Mary Baker, Secure History Preservation through Timeline Entanglement, Proceedings of the 11th USENIX Security Symposium, August 2002.
- In-depth study of proposed system designs
- Mark W. Storer, Kevin M. Greenan, Ethan L. Miller, Kaladhar Voruganti, POTSHARDS—A Secure, Long-Term Storage System ACM Transactions on Storage 5(2), June 2009.
- Sinjoni Mukhopadhyay, Joel Frank, Daniel Bittman, Darrell D. E. Long, Ethan L. Miller, Efficient Reconstruction Techniques for Disaster Recovery in Secret-Split Datastores Proceedings of MASCOTS 2018, September 2018.
- Maniatis, et al., The LOCKSS peer-to-peer digital preservation system, ACM Transactions on Computer Systems 23(1), February 2005.
- T. Schwarz, E. L. Miller,
Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage,
Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS 2006), July 2006. - M. W. Storer, K. M. Greenan, E. L. Miller, K. Voruganti,
Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage,
Proceedings of FAST 2008, February 2008. - In-depth study of proposed system designs
- J. J. Wylie, et al., Survivable information storage systems, IEEE Computer 33(8), August 2000.
- A. Bessani, et al., DepSky: Dependable and Secure Storage in a Cloud-of-Clouds, ACM Transactions on Storage 9(4), November 2013.
- A. Celesti, M. Fazio, M. Villari, A. Puliafito, Adding long-term availability, obfuscation, and encryption to multi-cloud storage systems, Journal of Network and Computer Applications 59, January 2016.
- J. Braun, et al., LINCOS - A Storage System Providing Long-Term Integrity, Authenticity, and Confidentiality, Proceedings of the 13th ACM Asia Conference on Computer and Communications Security (ASIACCS 2017), 2017.
- OPTIONAL: S. Rhea, et al., Pond: the OceanStore Prototype, Proceedings of FAST ’03: 2nd USENIX Conference on File and Storage Technologies, February 2003.
- Future directions for research
- No class June 6 and June 8 (SYSTOR 2023)
No additional readings for this week. However, please make sure that you've read (and refreshed your memory!) all of the readings for the quarter. We will be discussing future directions for the research area, and covering unanswered questions as well as possible directions for answering them.