IStorM CHAOS Chaos Engineering Platform
Currently, existing stability assurance measures for enterprises focus on preventing known system defects. For faults that require specific external conditions to trigger, there is a lack of testing methods and tools, making the response time and cost uncontrollable when such faults occur in production. Tongchuang Yongyi's Chaos Engineering Fault Drill Platform effectively solves this problem and has received an advanced-level assessment from the "Trusted Cloud Chaos Engineering Platform."
Compared with primary chaos engineering products, Tongchuang Yongyi Chaos Engineering Drill Platform has a rich and scalable fault library. In addition to providing enterprises with all basics and covering almost all known faults, it also supports customization to expand the fault library. At the same time, during the drill process, it can provide enterprises with complete business protection, quickly restore the drill, and provide a business sandbox function to conduct drills under fully simulated real isolation to enhance drill security. In addition, based on internal fault drills, the Tongchuang Yongyi Chaos Engineering fault drill platform provides a wealth of predefined fault scenarios, covering basic services, microservice governance, cloud native container orchestration, backup disaster recovery and other scenarios. Enterprise users can easily expand the scenarios, or precipitate historical experiments into a scenario library. Scenario orchestration based on the workflow engine supports parallel and serial combinations, as well as process definitions for scheduled automatic execution. The exercise process is flexible and controllable, and the exercise can be terminated at any time. For enterprises, through the Tongchuang Yongyi Chaos Engineering Drill Platform, enterprises can verify system stability and discover weaknesses in systems or applications; verify the fault tolerance and protection methods of microservices; verify whether business orchestration and configuration are reasonable; verify the discovery capabilities of monitoring and the effectiveness of alarm systems; verify the applicability and availability of disaster recovery plans and emergency plans, etc., so that fault handling can be changed from passive to proactive, and business systems are safer.