Loading...
Please wait, while we are loading the content...
Similar Documents
Method and Tensor Traversal Engine for Strided Memory Access During Execution of Neural Networks
| Content Provider | The Lens |
|---|---|
| Abstract | A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register storing a control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal comprising an initial source address, an initial destination address, a first source stride length in a first dimension, and a first source stride count in the first dimension; a source address register communicatively coupled to the control signal register; a destination address register communicatively coupled to the control signal register; a first source stride counter communicatively coupled to the control signal register; and control logic communicatively coupled to the control signal register, the source address register, and the first source stride counter. |
| Related Links | https://www.lens.org/images/patent/US/11550586/B2/US_11550586_B2.pdf |
| Language | English |
| Publisher Date | 2023-01-10 |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Patent |
| Jurisdiction | United States of America |
| Date Applied | 2021-05-26 |
| Agent | Run8 Patent Group, Llc Peter Miller Brian T. Chew |
| Applicant | Deep Vision Inc |
| Application No. | 202117331590 |
| Claim | A method for executing a data transfer operation from a source memory component to a destination memory component comprising: writing, to a control signal register, a control signal representing a custom source access pattern comprising a set of source data blocks in the source memory component, the control signal comprising: a base pointer array address; and an initial destination address; accessing a pointer array at the base pointer array address, the pointer array comprising a set of pointer array elements, each pointer array element: representing a source data block in the set of source data blocks; and comprising: a source address for the source data block; and a source block length for the source data block; writing the initial destination address to a destination address register; and for each pointer array element in the set of pointer array elements: writing the source address for the source data block to a source address register; writing the source block length for the source data block to a source block counter; and in response to a current source block count in the source block counter representing at least one source data word remaining in the source data block: transferring a source data word stored at a current source address in the source address register to a current destination address in the destination address register; incrementing the current source address in the source address register; incrementing the current destination address in the destination address register; and decrementing the current source block count in the source block counter. The method of claim 1, wherein writing to the control signal register comprises writing, to the control signal register the control signal representing the custom source access pattern comprising the set of source data blocks in the source memory component, the control signal comprising: the base pointer array address; a pointer array length; and the initial destination address. The method of claim 2, wherein accessing the pointer array at the base pointer array address comprises loading the pointer array into a pointer array queue based on the base pointer address and the pointer array length. The method of claim 3: wherein writing the source address for the source data block to the source address register comprises: reading the source address for the source data block from a first pointer array element in the pointer array queue; and writing the source address for the source data block to the source address register; wherein writing the source block length for the source data block to the source block counter comprises: reading the source block length for the source data block from the first pointer array element in the pointer array queue; and writing the source block length for the source data block to the source block counter; and further comprising, for each pointer array element in the set of pointer array elements, in response to writing the source address for the source data block to the source address register and, in response to writing the source block length for the source data block to the source block counter, dequeuing the first pointer array element from the pointer array queue. The method of claim 1, wherein accessing the pointer array at the base pointer array address comprises: writing the base pointer array address to a pointer address register; and writing the pointer array length to a pointer array counter. The method of claim 5: wherein writing the source address for the source data block to the source address register comprises: reading a current pointer array address in the pointer address register; reading the source address for the source data block from the pointer array element at the current pointer array address; and writing the source address for the source data block to the source address register; wherein writing the source block length for the source data block to the source block counter comprises: reading the current pointer array address in the pointer address register; reading the source block length for the source data block from the pointer array element at the current pointer array address; and writing the source block length for the source data block to the source block counter; and further comprising, for each pointer array element in the set of pointer array elements, in response to writing the source address for the source data block to the source address register and, in response to writing the source block length for the source data block to the source block counter, incrementing the current pointer array address in the pointer address register. The method of claim 1, wherein transferring a source data word stored at a current source address in the source address register to a current destination address in the destination address register comprises: at a first time, loading the source data word from the current source address into a transpose buffer according to a first buffer dimension of the transpose buffer; and at a second time, transferring the source data word from the transpose buffer according to a second buffer dimension of the transpose buffer. The method of claim 1, wherein transferring the source data word stored at the current source address to the current destination address comprises: at a first time, loading the source data word from the current source address into a data buffer; and at a second time, transferring the source data word from the data buffer to the current destination address. The method of claim 1, wherein writing to the control signal register comprises writing, to the control signal register, the control signal representing the custom source access pattern comprising the set of source data blocks in the source memory component, the control signal: representing a destination storage pattern in the destination memory component: defining a first dimension; and comprising a set of destination blocks; and comprising: the base pointer array address; the initial destination address; and a first destination stride length in the first dimension; in response to the current source block count in the source block counter representing no source data words remaining in the source data block, advancing the current destination address in the destination address register based on the first destination stride length and the current destination address. A method for executing a data transfer operation from a source memory component to a destination memory component comprising: writing, to a control signal register, a control signal: representing a custom source access pattern comprising a set of source data blocks in the source memory component; representing a custom destination storage pattern comprising a set of destination blocks in the destination memory component; and comprising: a base source pointer array address; and a base destination pointer array address; accessing a source pointer array at the base source pointer array address, the source pointer array comprising a set of source pointer array elements, each source pointer array element: representing a source data block in the set of source data blocks; and comprising: a source address for the source data block; and a source block length for the source data block; for each source pointer array element in the set of source pointer array elements: writing the source address for the source data block represented by the source pointer array element to a source address register; writing the source block length for the source data block represented by the source pointer array element to a source block counter; and transferring the source data block at a current source address in the source address register to a data buffer based on a current source block length in the source block counter, and in response to the current source block count in the source block counter representing at least one source data word remaining in the source data block: enqueuing a source data word stored at a current source address in the source address register to the data buffer; incrementing the current source address in the source address register; and decrementing the current source block count in the source block counter; accessing a destination pointer array at the base destination pointer array address, the destination pointer array comprising a set of destination pointer array elements, each destination pointer array element: representing a destination block in the set of destination blocks; and comprising: a destination address for the destination block; and a destination block length for the destination data block; and for each destination pointer array element in the set of destination pointer array elements: writing the destination address for the destination block represented by the destination pointer array element to a destination address register; writing the destination block length for the destination data block represented by the destination pointer array element to a destination block counter; and transferring a source data block stored in the data buffer to a current destination address in the destination address register based on a current destination block length in the destination block counter. The method of claim 10, wherein transferring the source data block stored in the data buffer to the current destination address in the destination address register based on the current destination block length in the destination block counter comprises, in response to the current destination block count in the destination block counter representing at least one destination word remaining in the destination block: dequeuing a source data word stored in the data buffer to transfer the source data word to the current destination address in the destination memory component; incrementing the current destination address in the destination address register; and decrementing the destination block count in the destination block counter. A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register configured to store a control signal for a data transfer operation from the source memory component to the destination memory component, the control signal: representing a custom source access pattern comprising a set of source data blocks in the source memory component; representing a custom destination storage pattern comprising a set of destination blocks in the destination memory component; and comprising: a base source pointer array address; and a base destination pointer array address; a source address register; a source block counter; a source pointer address register; a source pointer array counter; a destination address register; a destination block counter; a data buffer; and control logic communicatively coupled to: the control signal register; the source address register; the source block counter; the source pointer address register; the source pointer array counter; the destination address register; and the destination block counter; and wherein the control logic is configured to, in response to a current source pointer array count in the source pointer array counter representing at least one source pointer array element remaining in the source pointer array: read a current source pointer array address in the source pointer address register; read the source address for a source data block in the set of source data blocks from the source pointer array element at the current source pointer array address: write the source address for the source data block to the source address register; transfer the source data block at the source address for the source data block to the data buffer; increment the current source pointer array address in the source pointer address register; and decrement the current source pointer array count in the source pointer array counter. The tensor traversal engine of claim 12, wherein the control logic is configured to execute the data transfer operation by: accessing a source pointer array at the base source pointer array address, the source pointer array comprising a set of source pointer array elements, each source pointer array element: representing a source data block in the set of source data blocks; and comprising: a source address for the source data block; and a source block length for the source data block; for each source pointer array element in the set of source pointer array elements: writing the source address for the source data block represented by the source pointer array element to the source address register; writing the source block length for the source data block represented by the source pointer array element to the source block counter; and transferring the source data block stored at a current source address in the source address register to a data buffer based on a current source block length in the source block counter; accessing a destination pointer array at the base destination pointer array address, the destination pointer array comprising a set of destination pointer array elements, each destination pointer array element: representing a destination block in the set of destination blocks; and comprising: a destination address for the destination block; and a destination block length for the destination data block; and for each destination pointer array element in the set of destination pointer array elements: writing the destination address for the destination block represented by the destination pointer array element to the destination address register; writing the destination block length for the destination data block represented by the destination pointer array element to the destination block counter; and transferring the source block stored in the data buffer to a current destination address in the destination address register based on a current destination block length in the destination block counter. The tensor traversal engine of claim 12, wherein the data buffer comprises a transpose buffer. The tensor traversal engine of claim 12: wherein the control signal register is configured to store the control signal for a stride d data transfer operation from the source memory component to the destination memory component, the control signal: representing the custom source access pattern; representing a strided destination access pattern; and comprising: the base source pointer array address; a destination stride length; a destination stride count; and a destination block count; further comprising a destination stride counter; and wherein the control logic is further communicatively coupled to the destination stride counter. The tensor traversal engine of claim 12: wherein the control signal register is configured to store the control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal: representing a strided source access pattern; representing the custom destination access pattern; and comprising: a source stride length; a source stride count; a source block count; and a base destination pointer array address; further comprising a source stride counter; and wherein the control logic is further communicatively coupled to the source stride counter. The tensor traversal engine of claim 12: wherein the control signal register is configured to store the control signal for the data transfer operation from the source memory component to the destination memory component, the control signal: representing the custom source access pattern comprising the set of source data blocks in the source memory component; representing the custom destination storage pattern comprising the set of destination blocks in the destination memory component; and comprising: the base source pointer array address; a source pointer array length; the base destination pointer array address; and a destination pointer array length; further comprising a source pointer array queue configured to store a set of source pointer array elements characterized by the source pointer array length; and further comprising a destination pointer array queue configured to store a set of destination pointer array elements characterized by the destination pointer array length. The tensor traversal engine of claim 12: wherein the control signal register is configured to store the control signal for the data transfer operation from the source memory component to the destination memory component, the control signal: representing the custom source access pattern comprising the set of source data blocks in the source memory component; representing the custom destination storage pattern comprising the set of destination blocks in the destination memory component; and comprising: the base source pointer array address; a source pointer array length; the base destination pointer array address; and a destination pointer array length; further comprising: a destination pointer address register; and a destination pointer array counter; and wherein the control logic is further communicatively coupled to: the destination pointer address register; and the destination pointer array counter. The tensor traversal engine of claim 18, wherein the control logic is configured to: write the base source pointer array address to the source pointer address register; write the source pointer array length to the source pointer array counter; write the base destination pointer array address to the destination pointer address register; write the destination pointer array length to the destination pointer array counter; and in response to a current destination pointer array count in the destination pointer array counter representing at least one destination pointer array element remaining in the destination pointer array: read a current destination pointer array address in the destination pointer address register; read the destination address for a destination block in the set of destination blocks from the destination pointer array element at the current destination pointer array address; write the destination address for the destination block to the destination address register; transfer the source data block in the data buffer to the destination address in the destination component; increment the current destination pointer array address in the destination pointer address register; and decrement the current destination pointer array count in the destination pointer array counter. |
| CPC Classification | ELECTRIC DIGITAL DATA PROCESSING COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS |
| Examiner | Ernest Unelus |
| Extended Family | 102-742-874-055-362 095-565-973-177-684 161-663-728-663-989 005-417-084-126-984 014-266-238-814-052 197-042-041-365-154 |
| Patent ID | 11550586 |
| Inventor/Author | Shahim Mohamed Datla Raju Hameed Rehan Kallem Shilpa |
| IPC | G06F9/345 G06F3/06 G06F9/30 G06F9/50 G06F9/54 G06N3/02 G06N3/063 |
| Status | Active |
| Owner | Deep Vision Inc |
| Simple Family | 095-565-973-177-684 102-742-874-055-362 161-663-728-663-989 005-417-084-126-984 014-266-238-814-052 197-042-041-365-154 |
| CPC (with Group) | G06F3/061 G06F3/0659 G06F3/067 G06F9/3455 G06N3/045 G06N3/063 G06F3/0647 G06F3/0655 G06F3/0679 G06F9/30098 G06F9/5016 G06F9/544 G06N3/02 |
| Issuing Authority | United States Patent and Trademark Office (USPTO) |
| Kind | Patent/New European patent specification (amended specification after opposition procedure) |