The rise of data center switches in the AI ​​era

There is no doubt that this will be a world where both corners and corners are covered and influenced by AI ( Artificial Intelligence )! The three key factors that accelerate the advancement of this era are computational forces, algorithms, and data, the Trinity. In the field of algorithms, a batch of innovative entrepreneurs are constantly emerging; the explosive growth of data provides sufficient raw materials for analysis and forecasting; then, what about computing power?

The constant computing power comes from the ongoing upgrade and optimization of the infrastructure. The three pillars of infrastructure are computing, storage and networking, and the Trinity.

In order to improve the efficiency of AI, there has been a huge technological change in the computing and storage of data centers: storage media evolved from mechanical hard disk (HDD) to flash disk (SSD) to meet real-time access requirements; GPU is used in computing Even dedicated AI chips to meet the needs of efficient computing. While storage media and computing power have increased dramatically, the network has become the last bottleneck that must be broken in the AI era.

You are not curious, what is the data center switch in the AI era?

Image source: worm creative

New challenges in the AI era data center

Huawei took the lead in giving an answer – the industry’s first data center switch CloudEngine 16800 for the AI era, which is the industry’s largest switch capacity and highest performance switch, and more uniquely installed “AI brain.”

Why is the CloudEngine 16800 able to “pass through” the AI era?

The fourth technological revolution with AI as the engine is bringing us into a new era of the perception of everything, the interconnection of all things, and the intelligence of all things. According to Huawei’s GIV (Global Industry Vision) 2025 forecast, by 2025, the new data volume will reach 180ZB, 95% of unstructured data (including voice/video, etc.) will be processed by AI; enterprise adoption of AI The rate will jump from 25% in 2018 to 86% in 2025; more and more companies will use AI to help decision-making, reshape business models and ecosystems, and rebuild the customer experience. Hu Kewen, president of Huawei’s network product line, pointed out that the evolution of data centers from the cloud era to the AI era has become inevitable.

So the question is, what new challenges will the data center in the AI era face?

The first big challenge, the traditional Ethernet we commonly use, has a packet loss rate of about one in a thousand, which is still a good situation. It is this one-thousandth of the other areas that seem insignificant to mean that the network computing power can only reach 50%. Huawei has gained insight into this situation through actual testing. In the AI era, the network should achieve zero packet loss.

The second biggest challenge is to achieve zero packet loss on the network, and the bandwidth is not enough. “In the next five years, the digital flood is still surging. The AI data is getting more and more concentrated, the data center is getting bigger and bigger, the demand for bandwidth is more, and the mutual visits are more frequent.” Hu Kewen said, “The server’s network port.” From 10G to 25G to 100G, the upgrade speed is beyond imagination, especially the growth rate of 25G to 100G in China far exceeds other regions in the world.” The growth of server computing power is a very important demand, and the interconnection between servers Great changes have also taken place, and even 100G networks are hard to meet the business needs of the AI era.

The third challenge, in the data center, computing network, storage network and data network have been triple-played, which means that computing, storage, and network are integrated. If there is no big bandwidth, the integration may be a disaster. Another problem that makes the network administrator have a headache is how to quickly and accurately locate the fault point and eliminate the fault in time when the fault occurs. Traditional methods of manual operation and maintenance have become unsustainable, and it is urgent to introduce innovative technologies to enhance the ability of intelligent operation and maintenance.

Hu Kewen said: “I have visited many customers in the past year. They generally report that in the past three years, almost all the energy has been placed on how to deploy a ‘cloud’, but after the cloud system is really established, it suddenly discovers The network has become a new bottleneck. Users are eager to know what the future data center network should look like?”

What should the data center network in the AI era be? The Huawei CloudEngine 16800 defines three characteristics of the AI era data center switch: embedded AI chip, single slot 48×400GE, and the ability to evolve into an autonomous driving network, still a trinity. It is easy to resolve the three challenges of the above AI era data center.

The AI chip is coming, is it difficult to play 100% AI?

Since it is a data center switch in the AI era, how can it not have a “core” of AI?

The CloudEngine 16800 is the industry’s first data center switch with a high-performance AI chip. A high-performance AI chip embedded in the main control board of the switch is Huawei’s Ascend AI chip. The AI chip uses 12ns technology, the maximum power consumption is only 8W, and the floating point computing power is up to 8T Flops. It is especially good at running various deep learning AI algorithms. According to estimates, the ability of such an AI chip can exceed the computing power of the current mainstream 25 dual CPU servers.

Based on Huawei’s original lossless intelligent lossless switching algorithm, CloudEngine 16800 can perform real-time learning and training on network traffic and dynamically set optimal network parameters according to the characteristics of different service traffic models to control traffic more accurately and achieve millions of flows. The application-based queue adapts to the global network self-optimization capability of different scenarios, ensuring that the data center network achieves the highest throughput on the basis of no packet loss transmission. Such an intelligent lossless data center network overcomes the computational power loss caused by traditional Ethernet packet loss and increases the AI computing power directly from 50% to 100%, and the data storage IOPS (Input/Output Operations Per Second) performance is increased by 30%.

100GE has arrived, is 400GE still far away?

The data center is the convergence point of Internet traffic, and new services such as enterprise AI drive the switching of data center servers from 10G to 25G or even 100G. Today, large Internet companies and telecom operators represented by BAT have basically achieved 100G switching. The reason why many enterprises adopt 100G servers is that AI training involves the synchronization of a large number of model parameters, which poses a high challenge to network bandwidth and throughput. The trend of digitization and the “new Moore’s Law” driven by the AI business – Moore’s Law of Flows has begun to work, and the flow of data centers will double every 24 months. In order to cope with the demand for large data and mixed services in the AI era, upgrading the network from 100G to 400G is just around the corner. The standardization of the 400GE interface was launched in 2015 and is currently standardized for data center applications.

The CloudEngine 16800 has comprehensively upgraded the hardware switching platform. Based on the orthogonal architecture, it breaks through many technical problems such as ultra-high-speed signal transmission, super heat dissipation, and high-efficiency power supply, enabling the single-slot to provide the industry’s highest density 48-port 400GE line card. Providing the industry’s largest 768-port 400GE switching capacity, the switching capacity is up to five times the industry average, fully meeting the needs of the AI era traffic multiplication.

The CloudEngine 16800 uses a number of new materials and processes to ensure a compatible evolution from 100G to 400G. For example, from 100G to high-density 400G, the first test is the high-speed transmission capability of the signal. In the 400G interface system, the interconnect signal frequency is above 53G, and the signal frequency is doubled, and the PCB circuit board signal attenuation will increase by more than 20%. However, due to the conventional copper foil material and manufacturing process problems of the conventional circuit board, when the signal transmission rate is increased, the loss and high-frequency interference are very serious, and there is a rate limit. Huawei uses a new sub-micron non-destructive material and polymer bonding technology to increase the transmission efficiency of electrical signals by 30%. For example, Huawei uses the industry’s first dual-input intelligent switching power module to achieve energy efficiency optimization through SuperPower, saving 50% power space and 90% power supply efficiency.

Operation and maintenance are automated, what is the difference in the automatic driving network?

Currently, computing, storage, and networking are rapidly converging. Data center server clusters are getting larger and larger and analyzed traffic is increasing by a thousand times. Information reporting or acquisition frequency is compressed from minute to millisecond, plus information redundancy. These have made the scale of the intelligent operation and maintenance platform expands and the performance pressure has increased sharply. How to reduce the pressure on the intelligent operation and maintenance platform? Let the network equipment closest to the server and the closest data have intelligent analysis and decision-making functions, which is the key to improve operation and maintenance efficiency.

Based on the built-in AI chip, the CloudEngine 16800 can greatly improve the intelligence level of the network edge, that is, the device level, enabling the switch to have local reasoning and real-time fast decision-making capabilities. The distributed intelligence can be built by local intelligence combined with the centralized FabricInsight network analyzer. Dimensional architecture to achieve second-level fault identification and automatic fault location for minute-level faults, enabling automatic driving network acceleration. At the same time, the architecture can greatly enhance the flexibility and deployability of the operation and maintenance system.

Why is it Huawei?

In the ICT era, the “window” of every change seems to always see Huawei.

On August 8, 2012, Huawei released the CloudEngine 12800 data center switch for the cloud computing era and officially entered the data center network field. The leading architecture of the CloudEngine 12800 switch leads the design trend of high-density 100G platform data center switches. Its advanced design concepts such as orthogonal architecture, front, and rear air ducts, and panel air intake are simulated by the industry’s subsequent data center switches.

Starting with the launch of CloudEngine 12800, Huawei’s data center network solution has been well received by the industry, and its sales revenue has maintained rapid growth for six consecutive years. IDC’s report shows that since 2016, Huawei’s network products have ranked first in the domestic market; on a global scale, Huawei’s network products ranked first in the compound growth rate between 2013 and 2017. Huawei also entered the 2018 Forrester Wave Leaders Quadrant.

On the occasion of the AI era, Huawei took the lead and launched the AI data center switch CloudEngine16800, setting a new benchmark for the industry.

Bao Jianfeng smashed out, plum blossoms came from the bitter cold. The birth of any star product has not been tempered, carefully crafted every detail. CloudEngine16800 seems to be born, but without the accumulation, innovation and repeated training from 2012, there will be no blockbuster CloudEngine16800 today. Today, Huawei CloudFabric Smart Cloud Data Center Network Solution has been successfully commercialized in more than 6,400 enterprises around the world, helping customers in many industries such as finance, Internet, and operators to transform digitally, enabling data centers to become commercial value creation centers.

On January 9, 2019, this moment will be remembered – the data center network has since entered the AI era!


Amra Author

Leave a Reply

Your email address will not be published.