Clusters – Full Solutions Stack Provider
Accelerate your team’s AI progress with a with our plug & play rack integration solutions that are delivered with our optimized, lab tested & verified GPU clusters.
We’re your single vendor managing your entire cluster for compute, storage, and networking workloads.
Save time and effort with our fully populated racks delivered to you completely assembled, racked, stacked, cabled, and labeled. Once your components have been set up, the system can be powered on, tested, and integrated into your environment.
Equus helps you eliminate the labor-intensive tasks required to install new server clusters, storage devices and networking equipment by utilizing our experience delivering scalable rack solutions. You will feel at ease knowing that you’ll get consistency and quality, regardless of the specific requirements for your deployment.
Accelerated GPU Compute
Fast, High-Density Storage
Our clusters are designed and tailored for your workloads. Each solution is complete with provisioning, workload management, and a variety of prescripted SDK frameworks for performance-optimized AI/HPC applications and storage software to meet your requirements.
Clustering With Weka
Which is an NVME-native, resilient, parallel, and scalable file storage system. Weka will manage the retrieval of data between an operating system and a storage server and support GPU, and CPU based clusters designed for maximum performance and scalability.
|PCI-Express (PCI-E)||8x AS -1114S-WN10RT (1U 1-Node)||8x AS -2114S-WN24RT (2U 1-Node)||x SYS-220BT-HNTR (2U 4-Node)||4x SYS-220BT-DNTR (2U 2-Node)|
|Storage Media NVMe*||80x (10x 7.68TB/node) KIOXIA Gen4||10x (20x 7.68TB/node) KIOXIA Gen4||48x (6x 7.68TB/node) KIOXIA Gen4||96x (12x 7.68TB/node) KIOXIA Gen4|
|CPU||8x AMD EPYC™ 7402P Processors**||8x AMD EPYC™ 7402P Processors**||16x Intel® Xeon® Silver 4314 Processors||16x Intel® Xeon® Silver 4314 Processors|
|Network||16x single-port Mellanox CX-6 200G VPI||16x single-port Mellanox CX-6 200G VPI||16x single-port Mellanox CX-6 200G VPI||16x single-port Mellanox CX-6 200G VPI|
|IOPS***||10.2M (1.275M per node)||8.4M (1.05M per node)||8.5M (1.06M per node)||11.28M (1.41M per node)|
|Bandwidth***||235GB/s (29.4GB/s per node)||280GB/s (35GB/s per node)||202GB/s (25GB/s per node)||286GB/s (35.8GB/s per node)|
|I/O Latency||< 200μs||< 200μs||< 200μs||< 200μs|
|SW Subscription||1yr, 3yrs, 5yrs||1yr, 3yrs, 5yrs||1yr, 3yrs, 5yrs||1yr, 3yrs, 5yrs|
Putting all the Pieces Together
Our rack integration services allow a simpler way to cluster and scale
- Faster Deployment
- Reduced Costs
- Near Zero Defects Builds
- Increased Configuration Control
- Improved Testing Capabilities & Automation
Rack Integration Process
Network Design Review
Rack & Stack
Network & Power
Cable & Label
Provision Switch & IP Address
OS, Customer Image & Software Load
Full Rack Burn-IN & QA
Full Rack Test Report & Audit Report
End-To-End Performance Report
Crating or Kitting
Specialty Packaging (Air-Ride, Anti Tip)
White Glove Services
Onsite Installation Manual Creation
Installation (Global Smart Hands)
Onsite Troubleshooting and Diagnosis
Full Rack Testing Automation
Functionality testing of all components used in the assembled rack are tested for 100% verification based on expected performance (servers, CPU’s, memory, hard drives, switches, PDUs, NICs, SFP, optical cables, network cables, disk controllers, KVM trays)
The network points will be verified to be communicating between the proper designated source and destination ports and at the expected data speed parameters and the related timing. At-speed network traffic tests incorporate comprehensive testing for packet loss/collisions/framing errors, switch fabric and routing, and other performance issues not detectable using a simple link test.
Automated Test Scripts
Automated test sequencing software and scripts for pass or fail determination ensuring that all tests are properly executed consistently and reliably, reduces human intervention and errors on every component and cable. Verification of the actual hardware configuration against the specified BOM.
Stress testing all major sub-systems of a computer for underlying rack-level issues that may otherwise go undetected (e.g., airflow, thermal performance/analysis, loadGen (CPU/GPU/Storage/Network), power consumption balancing, max power, failover and redundancy testing, etc.
Will create a log file from the functional test that is a critical baseline reference that creates a serialized inventory and captures the performance of the components. This complete traceability record is a critical diagnostic tool for future reference during failure events for lot containment in the field.
The team at Deeplearning specialize in building highly customized GPU or CPU based HPC clusters, and we will guide you through the entire process from start to finish as your one-stop shop for all your datacenter needs.