On December 19, 2020, the Filecoin network experienced an on-chain outage, which meant that new blocks could be created for a period of time, but miners could not reach consensus on the resulting state, and each block calculated a different value. Thanks to the rapid response among community members, miners, and developers - a fix was released within four hours, and the network achieved full recovery within seven hours. The underlying issue is potentially non-deterministic iteration over a mapping of objects in the storage miner actor implementation. The actor is implemented in Go. Iteration over Go mappings is known to be non-deterministic. Participants always sort the results of an iteration before using it (enforced by static analysis). Unfortunately, a bug in the comparison function used when sorting two such maps resulted in an invalid sort (see #1335 ). As a result, different nodes processed the map entries in different orders, leading to different results and gas consumption. This code path can only be reached by (a) a miner declaring multiple sectors to terminate at once, or (b) a miner recovering from a failure across multiple partitions at once. (The other two code paths get to this point, but are extremely unlikely in practice.) Neither of these paths has been used in mainnet before, with multiple sectors/partitions exposed as non-deterministic data. The simultaneous termination of multiple sectors triggered this stall. Most importantly, it should be emphasized that no data was lost during the outage . While the inability to create new blocks temporarily inhibited transactions on the network, all data provided by storage providers is safe and available once the network is back up and running. In addition, it is worth noting that the Filecoin protocol specification provides for data retrieval even in the event of a chain outage. In other words: on-chain transactions were not possible for the duration of the event, but the core functionality of the Filecoin network remained intact. The speed with which basic issues were first discovered, identified, fixed, and deployed was also evident: 1. Automatic monitoring triggered an alarm within 15 minutes of the incident. 2. Within thirty minutes, miners and implementation developers came together to respond 3. Within four hours, the developers identified and released a fix for this issue 4. Within seven hours, enough nodes adopted the fix to exceed the power threshold for majority consensus, putting the network on the path to recovery This is an incredibly fast response for a young decentralized network. Even though established blockchains experience chain pauses and forks, the time it takes Filecoin to resolve this event is comparable to blockchains that have been running for years. The entire community should be proud of the speed with which this event was handled. Building a blockchain is like building a rocket. There are so many complex technologies involved that it’s hard to get everything right on the first try. Just like a real rocket, unexpected events can be hard to anticipate. When they do happen, it’s important to have the infrastructure in place to resolve the issue as quickly as possible, minimize the impact, and reduce the likelihood of the problem happening again. To this end, multiple teams worked on the writing and execution of post-mortems, identifying test coverage for actors/roles and other improvements to alerting and issue escalation for network infrastructure/communications to help mitigate future incidents. With the concerted efforts of the entire Filecoin community, this new technology will continue to be improved. We believe that the entire network will continue to improve in the process of discovering and solving problems, and will eventually form a stable and reliable "launchable" platform. |
>>: What else can millions of Ethereum 4GB graphics card mining machines do?
Have you ever had any of the following questions?...
At the beginning of 2019, I believe everyone is l...
Some people are born troublemakers and always cre...
Text: Pizza Source: A Blockchain In the history o...
Simplex, an Israeli startup focused on buying Bit...
Banks and other data operators may be attracted b...
On October 31, the Bitfilm Film Festival opened a...
What do dead fish eyes look like? Analysis of dea...
What kind of face will make a woman have a good c...
Everyone knows that the bridge of the nose is a v...
1. Price Trends The current trend is still not ve...
For a woman, marriage is a lifelong career. To ma...
No matter what decision a person makes, it is act...
Women with moles on the left side of their neck a...
In today's society, there are many people who...