Environment Requirements and Capacity Planning¶
Environment Requirements¶
The following table lists the system and hardware requirements of the performance test environment and production environment. You can also refer to the capacity planning chapter to accurately customize the deployment plan based on your cluster’s actual capacity planning. Note that since the DataNode used some features of linux kernal, so that the kernel version of servers which used for deploy DataNode must be later than 3.10.
In order to speed up read and write of meta data, the meta data is stored in memory, while the DataNode mainly occupies disk resources. To maximize the use of node resources, you can mix-deploy DataNode and MetaNode on the same node.
Role | Spec | Test | Product |
Master | CPU | >=4C | >=8C |
Memory | >=4G | >=16G | |
Kernel | >=3.10 | >=3.10 | |
Nodes | 3 | 3 | |
DataNode | CPU | >=4C | >=4C |
Memory | >=4G | >=8G | |
Kernel | >=3.10 | >=3.10 | |
Disk Capacity | >=1TB | >=2TB | |
Disk Type | sata | ssd | sata | ssd | |
File System | xfs | etx4 | xfs | etx4 | |
Nodes | >=3 | 100~1000 | |
MetaNode | CPU | >=4C | >=8C |
Memory | >=8G | >=16G | |
Kernel | >=3.10 | >=3.10 | |
Nodes | >=4 | 100~1000 | |
Client | CPU | >=2C | >=2C |
Memory | >=4G | >=1G | |
Kernel | >=3.10 | >=3.10 |
Capacity Planning¶
First of all, you have to assess the highest expected number of files and storage capacity of the cluster in the future. Secondly, you need to know the machine resources you currently have, and the total memory, CPU cores, and disks on each machine. If you have been clear about those statistics, you can use the empirical reference values given in the second section to see which scale your current environment belongs to, what file size it can carry,or you need to prepare for the current file experience requirements How many resources to prevent frequent expansion of machine resources.
Total File Count | Total File Size | Total memory | Total Disk Space |
---|---|---|---|
1,000,000,000 | 10PB | 2048 GB | 10PB |
The higher the proportion of large files, the greater the MetaNode pressure.
Of course, if you feel that the current resources are adequately used, you don’t need to meet the capacity growth requirements all at once. Then you can pay attention to the capacity warning information of MetaNode/DataNode in time. When the memory or disk is about to run out, dynamically increase MetaNode/DataNode to adjust the capacity. In other words, if you find that the disk space is not enough, you can increase the disk or increase DataNode. If you find that all MetaNode memory is too full, you can increase MetaNode to relieve memory pressure.
Multi-Zone Deploy¶
If you want the cluster to support fault tolerance in the computer room, you can deploy a ChubaoFS cluster across computer rooms. At the same time, it should be noted that since the communication delay between computer rooms is higher than that of a single computer room, if the requirements for high availability are greater than low latency, you can choose a cross-computer room deployment solution. If you have higher performance requirements, it is recommended to deploy clusters in a single computer room. Configuration scheme: Modify the zoneName parameter in the DataNode/MetaNode configuration file, specify the name of the computer room where you are, and then start the DataNode/MetaNode process, the computer room will be stored and recorded by the Master along with the registration of DataNode/MetaNode.
Create a single zone volume:
$ cfs-cli volume create {name} --zone-name={zone}
In order to prevent volume initialization failure in a single computer room, please ensure that the DataNode of a single computer room is not less than 3 and MetaNode is not less than 4.
Create a cross-zone volume:
$ cfs-cli volume create {name} --cross-zone=true