The internet has become one of the most common things in our lives. Everyday we engage with it in some fashion. However, with something so common we often overlook the most obvious questions.
Where does all of the data I put into the internet go? How long will it stay there? Will it be there when I come back to it?
When you start to look at these questions the answers are surprising. Most people assume that what we put into the internet is static, and will basically be there whenever we need it.
However, reality points to the fact that the internet is anything but static, and would best be described as fluid.
Data is going out of the internet almost as fast as it is going in. And the lifespan of most data on the internet is less than three years.
The Problem of Fluidity with the Internet
If you go back to old blogs and websites you will notice more and more 404 errors popping up. A 404 error indicates that the page the link is pointing to is dead and no longer exists.
Web companies have come and gone, and so has the data uploaded to them.
For example in 2019, a failed server migration at MySpace deleted all music uploaded to the site between 2003 and 2015, losing 50 million songs that had only been stored through them.
And on top of that, web services have frequently been hacked and personal information would be leaked. Even major elections were being tampered with online via disinformation spreading, social engineering and other forms of manipulating the web’s algorithms.
A few facts about the fragility of the current state of the internet:
- Information is transient: One third of all the information on the internet is changed or gone within 2 years of it being put up; and after 20 years the majority of it has turned over.
- Link rot is common. Link rot refers to broken links to a web page that no longer exists. If you go back to the 1998, 72 percent of the links from the internet at that time are dead. Overall, more than half of all articles in The New York Times that contain deep links have at least one rotted link. Read More
- Study at Harvard: more than 70% of the URLs within three legal journals, and 50% of the URLs within U.S. Supreme Court opinions suffer reference rot. Read Study
- Changing terms of service: Companies will continually change their terms of service from what you originally signed up for. Do most people know that most companies state that they own the data you upload onto their platforms?
- High profile cases of lost data from social media giants Facebook, Instagram, Twitter, My Space and tech titans DropBox and Google.
What this all points to is that our personal and business data is not as safe as we think it is. There are short-term implications of potential loss, but also long-term consequences of how are we actually going to pass on some of our most valuable resources to future generations.
Read a Related Article: Internet is a Collective Hallucination: the rotting of the Internet and Lose of Data
Read a Related Article: Raiders of the Lost Web: How a Pulitzer Prize finalist’s 34-page essay got lost from the web
Is there a way forward to a more stable internet?
Arweave and the goal of long-term data storage
Sam Williams, the founder of Arweave, and the Arweave team started out with a mission to solve the long-term storage of data.
They looked at the current state of how information was kept on the internet and how people and companies managed their own files. What they saw was a very fragile system.
The new Library of Alexandria
To counter the information loss that the digital world was experiencing and the increased (and historical) censorship of data, Arweave turned to the classical Library of Alexandria for inspiration.
The Library of Alexandria was founded in the early 2nd Century BC as a storehouse of the world’s cumulative knowledge. It became the largest library of its time, with as many as 400,000 scrolls including many of the world’s greatest literary and scientific treasures. The Library remained in existence for more than 400 years before its decline.
Arweave seeks to resurrect the Library of Alexandria, but in this case in digital form where it will actually last!
An Archive for the Ages
To solve the problem of long-term data storage the Arweave team looked to leverage blockchain technology and innovate upon it to create permanent storage at an affordable cost.
Blockchain provided the building blocks of what Arweave needed to provide reliable long-term storage: immutability (data that doesn’t change), decentralization (not controlled by third parties), and ownership (people would be in charge of their own information). There are those words again from our previous section!
However, when looking at Bitcoin there were some obvious drawbacks that would need to be overcome:
- It is extremely energy intensive. Operating the Bitcoin network now requires the same amount of energy as a country like Austria or New Zealand every year.
- The Bitcoin blockchain is a poor storage of data due to the expense – among other factors. As of early 2022 it only holds about 390 GB, which you and I could easily store at home on a personal hard drive.
- The number of transactions per second on Bitcoin is low – around 5 transactions per second.
Arweave needed to overcome these obstacles without sacrificing any of the decentralization or security of the blockchain. What breakthroughs were required to make permanent storage happen? How did they use blockchain to solve this?