The binary primitive computes the result of a binary elementwise operation between tensors source 0 and source 1 (the variable names follow the standard Naming Conventions):
\[ \dst(\overline{x}) = \src_0(\overline{x}) \mathbin{op} \src_1(\overline{x}), \]
where \(op\) is addition, subtraction, multiplication, division, get maximum value or get minimum value.
The binary primitive does not have a notion of forward or backward propagations.
When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.
Primitive input/output | Execution argument index |
---|---|
\(\src_0\) | DNNL_ARG_SRC_0 |
\(\src_1\) | DNNL_ARG_SRC_1 |
\(\dst\) | DNNL_ARG_DST |
\(binary post-op\) | DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) | DNNL_ARG_SRC_1 |
{N,1}x{C,1}x{D,1}x{H,1}x{W,1}:{N,1}x{C,1}x{D,1}x{H,1}x{W,1} -> NxCxDxHxW
. It is consistent with PyTorch broadcast semantic.The following attributes are supported:
Type | Operation | Description | Res |
---|---|---|---|
Attribute | Scales | Scales the corresponding input tensor by the given scale factor(s). | Only one scale per tensor is supported. Input tensors only. |
Post-op | Sum | Adds the operation result to the destination tensor instead of overwriting it. | |
Post-op | Eltwise | Applies an Eltwise operation to the result. | |
Post-op | Binary | Applies a Binary operation to the result | General binary post-op restrictions |
The source and destination tensors may have f32
, bf16
, f16
or s8/u8
data types. The binary primitive supports the following combinations of data types:
Source 0 / 1 | Des |
---|---|
bf16 | bf16 |
s8, u8, f16, f32 | s8, u8, f16, f32 |
The binary primitive works with arbitrary data tensors. There is no special meaning associated with any of tensors dimensions.
f32
destination type source 0 and source 1 tensors must have f32
data type.Engine | Name | Com |
---|---|---|
CPU/GPU | Binary Primitive Example | This C++ API example demonstrates how to create and execute a Binary primitive. Key optimizations included in this example:
|
CPU/GPU | Bnorm u8 by binary post-ops example | Bnorm u8 via binary postops example. |