Loading...
Please wait, while we are loading the content...
Decommissioning, Re-commissioning, and Commissioning New Metadata Nodes in A Working Distributed Data Storage System
| Content Provider | The Lens |
|---|---|
| Abstract | In a running distributed data storage system that actively processes I/Os, metadata nodes are commissioned and decommissioned without taking down the storage system and without introducing interruptions to metadata or payload data I/O. The inflow of reads and writes continues without interruption even while new metadata nodes are in the process of being added and/or removed and the strong consistency of the system is guaranteed. Commissioning and decommissioning nodes within the running system enables streamlined replacement of permanently failed nodes and advantageously enables the system to adapt elastically to workload changes. An illustrative distributed barrier logic (the “view change barrier”) controls a multi-state process that controls a coordinated step-wise progression of the metadata nodes from an old view to a new normal. Rules for I/O handling govern each state until the state machine loop has been traversed and the system reaches its new normal. |
| Related Links | https://www.lens.org/images/patent/US/20220100710/A1/US_2022_0100710_A1.pdf |
| Language | English |
| Publisher Date | 2022-03-31 |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Patent |
| Jurisdiction | United States of America |
| Date Applied | 2021-09-02 |
| Applicant | Commvault Systems Inc |
| Application No. | 202117465691 |
| Claim | A method for decommissioning metadata nodes within a working distributed data storage system that comprises a plurality of storage service nodes, the method comprising: by a first metadata node, receiving read requests and write requests for metadata that is associated with a first range of keys within a set of keys, wherein the first metadata node comprises a first storage service node that executes a metadata subsystem of the distributed data storage system, wherein the set of keys are unique identifiers that ensure strong consistency within the distributed data storage system, wherein each key of the set is owned by exactly one metadata node in the distributed data storage system, wherein the first metadata node: owns the first range of keys, and stores and maintains first metadata files at the first storage service node, and wherein each first metadata file is associated with the first range of keys; by a second metadata node at a second storage service node that is distinct from the first storage service node, receiving read requests and write requests for metadata that is associated with a second range of keys within the set, wherein the second range is distinct from the first range, wherein the second metadata node: owns the second range of keys, and stores and maintains second metadata files at the second storage service node, wherein each second metadata file is associated with the second range of keys, and wherein the second metadata node comprises the second storage service node that executes the metadata subsystem of the distributed data storage system; executing a distributed barrier logic at one of the plurality of storage service nodes, wherein the distributed barrier logic controls a decommissioning of the second metadata node within the distributed data storage system without interrupting servicing of read requests from and write requests to any of the plurality of storage service nodes, wherein the decommissioning re-distributes ownership of the set of keys among metadata nodes in the distributed data storage system; after the decommissioning of the second metadata node is complete, by the first metadata node, receiving read requests and write requests for metadata associated with at least some keys in the second range of keys, wherein second metadata files associated with the at least some keys of the second range are stored at the first storage service node and maintained by the first metadata node; and wherein after the decommissioning of the second metadata node is complete, the second metadata node receives no read requests and no write requests within the distributed data storage system. The method of claim 1, wherein a first instance of the distributed barrier logic is synchronized with other instances of the distributed barrier logic in the distributed data storage system, and wherein each instance of the distributed barrier logic executes in a pod subsystem that is distinct from the metadata subsystem that executes in the first metadata node and in the second metadata node. The method of claim 1, wherein during the decommissioning of the second metadata node, the first metadata node becomes owner of the at least some keys of the second range and the second metadata node no longer owns the keys in the second range of keys. The method of claim 1, wherein the distributed barrier logic controls the decommissioning of the second metadata node by applying a state machine to control a progression of operations at the first metadata node and at the second metadata node without causing interruptions to servicing of read requests and write requests addressed to metadata files associated with the second range. The method of claim 1, wherein the decommissioning of the second metadata node comprises copying the second metadata files associated with the at least some keys of the second range to the first metadata node, and wherein the copying is performed by anti-entropy logic that executes in at least the first metadata node. The method of claim 1, wherein the decommissioning of the second metadata node within the distributed data storage system is completed when (i) all read requests addressed to metadata files associated with the at least some of the keys in the second range are served by the first metadata node and not by the second metadata node, and (ii) all write requests addressed to metadata files associated with the at least some of the keys in the second range are serviced by the first metadata node and not by the second metadata node. The method of claim 1 further comprising: after (i) all read requests addressed to metadata files associated with the at least some of the keys in the second range are served by the first metadata node and not by the second metadata node, and (ii) all write requests addressed to metadata files associated with the at least some of the keys in the second range are serviced by the first metadata node and not by the second metadata node: removing metadata files associated with the at least some of the keys in the second range from one or more of: the second metadata node and storage service nodes among the plurality that are not associated with the at least some of the keys in the second range. The method of claim 1, wherein after the decommissioning of the second metadata node is complete, a storage identifier that uniquely identifies the second metadata node in the distributed data storage system is permanently retired. The method of claim 8, further comprising: re-commissioning the second metadata node, at the second storage service node, into the distributed data storage system with a new storage identifier that is distinct from the storage identifier used by the second metadata node being decommissioned. The method of claim 1, wherein the decommissioning of the second service node includes re-distributing second metadata files associated with the second range that are stored at other storage service nodes that are distinct from the first storage service node and the second storage service node. The method of claim 1, wherein payload data tracked by the second metadata files associated with the second range are not moved in the decommissioning. A distributed data storage system comprising: a plurality of storage service nodes; a first metadata node that is configured to receive read requests and write requests for metadata that is associated with a first range of keys within a set of keys, wherein the first metadata node comprises a first storage service node that executes a metadata subsystem of the distributed data storage system, wherein the set of keys are unique identifiers that ensure strong consistency within the distributed data storage system, wherein each key of the set is owned by exactly one metadata node in the distributed data storage system, wherein the first metadata node: owns the first range of keys, and stores and maintains first metadata files at the first storage service node, and wherein each first metadata file is associated with the first range of keys; a second metadata node at a second storage service node that is distinct from the first storage service node, which is configured to receive read requests and write requests for metadata that is associated with a second range of keys within the set, wherein the second range is distinct from the first range, wherein the second metadata node: owns the second range of keys, and stores and maintains second metadata files at the second storage service node, wherein each second metadata file is associated with one of the keys in the second range of keys, and wherein the second metadata node comprises the second storage service node that executes the metadata subsystem of the distributed data storage system; at least one storage service node that executes a distributed barrier logic, wherein the distributed barrier logic is configured to control a decommissioning of the second metadata node within the distributed data storage system without interrupting servicing of read requests from and write requests to any of the plurality of storage service nodes, wherein the decommissioning re-distributes ownership of the set of keys among metadata nodes in the distributed data storage system; after the decommissioning of the second metadata node is complete, the first metadata node is further configured to: service read requests and write requests for metadata associated with at least some keys in the second range of keys, wherein second metadata files associated with the at least some keys of the second range are stored at the first storage service node and maintained by the first metadata node; and wherein after the decommissioning of the second metadata node is complete, the second metadata node is not authorized to process any read requests and any write requests within the distributed data storage system. The distributed data storage system of claim 12, wherein during the decommissioning of the second metadata node, the first metadata node is configured to become owner of the at least some keys of the second range and the second metadata node no longer owns the keys in the second range of keys. The distributed data storage system of claim 12, wherein the distributed barrier logic is configured to control the decommissioning of the second metadata node by applying a state machine to control a progression of operations at the first metadata node and at the second metadata node without causing interruptions to servicing of read requests and write requests addressed to metadata files associated with the second range. The distributed data storage system of claim 12, wherein the decommissioning of the second metadata node comprises copying the second metadata files associated with the at least some keys of the second range to the first metadata node, and wherein the copying is performed by anti-entropy logic that executes in at least the first metadata node. The distributed data storage system of claim 12, wherein the decommissioning of the second metadata node within the distributed data storage system is completed when (i) all read requests addressed to metadata files associated with the at least some of the keys in the second range are served by the first metadata node and not by the second metadata node, and (ii) all write requests addressed to metadata files associated with the at least some of the keys in the second range are serviced by the first metadata node and not by the second metadata node. The distributed data storage system of claim 12, wherein after (i) all read requests addressed to metadata files associated with the at least some of the keys in the second range are served by the first metadata node and not by the second metadata node, and (ii) all write requests addressed to metadata files associated with the at least some of the keys in the second range are serviced by the first metadata node and not by the second metadata node: metadata files associated with the at least some of the keys in the second range are removed from one or more of: the second metadata node and storage service nodes among the plurality that are not associated with the at least some of the keys in the second range. The distributed data storage system of claim 12, wherein after the decommissioning of the second metadata node is complete, a storage identifier that uniquely identifies the second metadata node in the distributed data storage system is permanently retired. The distributed data storage system of claim 12, wherein the decommissioning of the second service node includes re-distributing second metadata files associated with the second range that are stored at other storage service nodes that are distinct from the first storage service node and the second storage service node. The distributed data storage system of claim 12, wherein payload data tracked by the second metadata files associated with the second range are not moved in the decommissioning. |
| CPC Classification | ELECTRIC DIGITAL DATA PROCESSING TRANSMISSION OF DIGITAL INFORMATION; e.g. TELEGRAPHIC COMMUNICATION IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING PICTORIAL COMMUNICATION; e.g. TELEVISION |
| Extended Family | 182-143-262-735-989 119-505-711-588-445 114-704-305-936-707 029-468-698-928-894 055-588-910-221-095 116-326-169-650-487 106-915-828-633-924 141-584-587-044-63X 153-473-065-894-665 030-622-305-962-142 078-196-747-019-306 025-514-106-431-26X 073-713-053-812-810 091-768-596-698-496 017-319-810-914-042 113-678-318-553-489 |
| Patent ID | 20220100710 |
| Inventor/Author | Camargos Lásaro Jain Deepak Lakshman Avinash Naik Bharat Pundalik |
| IPC | G06F16/182 G06F16/23 |
| Status | Active |
| Owner | Commvault Systems Inc |
| Simple Family | 078-196-747-019-306 182-143-262-735-989 119-505-711-588-445 055-588-910-221-095 106-915-828-633-924 141-584-587-044-63X 017-319-810-914-042 153-473-065-894-665 |
| CPC (with Group) | G06F9/45558 G06F2009/45579 G06F2009/45595 G06F16/182 G06F16/2365 G06F3/0604 G06F3/0626 G06F3/067 G06F3/0629 G06F11/1453 G06F11/2094 G06F11/3476 G06F11/301 G06F11/1425 H04L67/1097 H04L67/1046 H04L67/1048 G06F2009/45583 G06V20/59 G06V10/25 G06V20/54 G06V40/10 G06V40/103 G06V10/34 G06V2201/08 H04N23/66 H04N23/56 G06F18/251 H04N7/181 H04N7/188 |
| Issuing Authority | United States Patent and Trademark Office (USPTO) |
| Kind | Patent Application Publication |