Technology | How is the IPFS network formed?

Technology | How is the IPFS network formed?

review

IPFS - InterPlanetary File System is a peer-to-peer distributed file storage system. The vision of IPFS is to build a distributed network around the world to replace the traditional centralized server model. All IPFS nodes form a distributed network, and each node can store files. Users can obtain files from the network built by IPFS in the form of DHT (Distributed Hash Table), thus realizing a new generation of completely decentralized network, aiming to replace the existing World Wide Web. IPFS has rich functions, such as DHT networking, file storage, Bitswap file exchange and other functions.

The technical details of file storage and file exchange can be found in previous tweets. Today we will learn about the "foundation" of this file system - the network module .

Introduction to IPFS Network

IPFS is an open source project. To achieve its claimed goal of building a distributed network around the world, it must first solve the connection problem of nodes in different countries and regions.

First, let’s take a look at the configuration of the IPFS network. As shown in the figure below, the Swarm in the red box is the network address that IPFS monitors. It supports ipv4 and ipv6 protocols, and supports the QUIC protocol by default.



*The QUIC protocol was first proposed by Google and has been submitted to the Internet Engineering Task Force (IETF) and has become a formal network specification. Compared with TCP, the QUIC network transmission protocol has a faster transmission speed.

After the IPFS node is started, the log is shown in the figure below. You can see that the IPFS node monitors the following network addresses, including local, LAN, WAN addresses, and finally the /p2p-circuit address.


The question is, why do we need to monitor so many addresses?

That’s because IPFS is an open source project. In order to connect nodes around the world, it is necessary to solve the problem of node connection under various network conditions.

Listen to the local address, so that multiple IPFS nodes can be started locally, and they can connect to each other with this address; listen to the LAN address, so that multiple IPFS nodes can be started in the intranet, and they can connect to each other with this LAN address; listen to the WAN address, so that multiple IPFS nodes can be started in the public network, and they can connect to each other with this WAN address.

The above method solves the network connection problem of IPFS nodes in most network situations:

Both nodes are on the same host: connected via 127.0.0.1 address

Two nodes in the same intranet: connected via LAN address

Both nodes have public network addresses: connect via public network addresses

1 node in the intranet, 1 node in the public network: the intranet node is connected via the public network address of the public network node

There is a problem here. If two nodes are in two different intranet environments, due to the existence of NAT devices, the NAT devices may be symmetric. Symmetric NAT devices cannot be penetrated, so IPFS provides a relay method to solve the connection problem of nodes in different intranet environments. The listening/p2p-circuit address mentioned above is to solve this problem. For two nodes in different intranet environments that cannot be directly connected, the connection is established by configuring the relay node.

So far, IPFS has solved the problem of establishing connections between nodes in different network environments. Next, let’s take a look at how IPFS builds a large-scale distributed node network to connect nodes in different regions around the world.

IPFS network construction

The process of IPFS network construction can be seen as two stages:

▲ Bootstrap stage

Before starting, the IPFS node needs to configure its Bootstrap node. The relevant configuration in the configuration file is shown in the figure below. The Bootstrap configuration configures a list of all seed nodes that the IPFS node needs to connect to when it starts. These node address list information is the default. If you need to build an IPFS private network, you can modify it to your own seed node list (the string starting with Qm is the node id of IPFS). The seed nodes provided by default are all nodes with public network addresses. When the IPFS node starts, it first connects to the seed node, and then uses the seed node to discover more nodes in the IPFS network and connect to them, which is the DHT networking stage.



▲ DHT networking stage

After the IPFS node successfully connects to the seed node, it will discover other nodes through DHT. For a detailed explanation of DHT, please refer to this article "Detailed Explanation of DHT and Bitswap in Libp2p" .

After finding other nodes, it will try to connect to them. The nodes that are successfully connected will be added to the node list of this node so that it can communicate directly with this node in the future. Considering the large scale of IPFS nodes in the world, it is impossible for each node to maintain a long connection with other nodes, so the number of connections for each node is limited. Generally, the number of node connections is less than 1,000 (configurable in the IPFS configuration file). If there is no connection and communication is needed, the node address can be found through DHT, and then the node can be connected for communication, thus forming a large-scale distributed node network.

We can illustrate the above process with an example. The following figure is a common network topology architecture, with three networks connected to the Internet. IPFS node1 is deployed on a server with a public IP address, and the outside world can directly access the node. IPFS node2 and IPFS node3 are both deployed behind a symmetric NAT device, and the outside world cannot access the node.


In the above network architecture, IPFS node1 in the public network is used as the seed node. The seed node is started first, and then the seed nodes of IPFS node2, node3, node4, and node5 are configured as IPFS node1. After starting, they first connect to IPFS node1. After the connection is successful, they discover other nodes through DHT and finally connect to them. For IPFS node1, the node address list it connects to is shown in the figure below. Since IPFS node2, node3, node4, and node5 are all behind the NAT device, the ports of these nodes in the IPFS node1 node list are all ports mapped by the NAT device (the default IPFS port started locally is 4001).

For IPFS node3, in its node address list, the address of IPFS node1 is a public network address. Since both IPFS node3 and IPFS node2 are behind NAT devices and cannot be directly connected, the address of IPFS node2 is a relay address. IPFS node1 is used as a relay node. When IPFS node3 sends a message to IPFS node2, it is forwarded through IPFS node1. The relay address format is:

Relay node address/p2p-circuit/p2p/target node ID

In the node address list of IPFS node3, the addresses of IPFS node4 and IPFS node5 are both LAN addresses, thus completing the networking process of public network nodes and LAN nodes behind NAT devices.


Summarize

The above is the process of establishing the IPFS network. For the convenience of description, only a few IPFS nodes are taken as examples.

In fact, this network construction method of IPFS can also well support the networking of ultra-large-scale nodes. When the node scale is very large (tens of thousands of nodes), dozens of nodes are set as seed nodes, and the networking of tens of thousands or even more nodes can be completed through DHT networking. At this time, the number of long connections of each node is maintained at hundreds. When the subsequent nodes communicate, if the connection has not been established, the address information of the node (address list, including all public network and LAN addresses) can be queried through DHT according to the node ID, and then the node is connected through the address to complete the communication process.

This way of organizing the IPFS network is also very worthy of learning and reference for distributed systems.

About the Author

Yao Wenhao is from the Data Grid Lab, BitXMesh team data platform architect


<<:  Market Analysis: Can Ethereum continue to rise after breaking through a new high?

>>:  IPFS Official @ You | 121st Weekly Report

Recommend

What are the signs of a short life? People with upturned nostrils

We all say that we hope we can live a long life, ...

What kind of palm lines do women have?

Observe a woman's fortune from her fingers 1....

What kind of man is not good?

Facial features are closely related to destiny, p...

UK blockchain firm Stratis raises €100,000 to develop blockchain applications

Crazy Review : Stratis, a blockchain solutions co...

Your forehead determines your destiny

Forehead is a noun. Refers to the part of the hum...

What kind of face does a man have to look righteous?

Whether we are looking for a romantic partner or ...

What do tear dimples represent? Is it good to have tear dimples?

There are various different facial features on pe...

Cross-eyed face reading

I have a pair of cross-eyed eyes and am often lau...

What does a deep philtrum mean in female physiognomy?

In fact, if a woman’s philtrum is deep and long, ...

How is the fortune if the lines on the palm are particularly messy?

From some of a person's external characterist...

What does a philandering woman look like?

If a woman is unfaithful, then this kind of woman...

The relationship between the five senses and marriage

In life, we hear that feelings determine marriage...

What does the black mole on Ma Yi's face mean?

Physiognomy includes mole physiognomy. In mole ph...