8. 通用流 API (rte_flow)

8.1. 概述

此API提供了一种通用的方式来配置硬件以匹配特定的 Ingress 或 Egress 流量,根据用户的任何配置规则更改其操作及查询相关计数器。

所有API带有前缀 rte_flow ,在文件 rte_flow.h 中定义。

  • 可以对报文数据(如协议头部,载荷)及报文属性(如关联的物理端口,虚拟设备ID等)执行匹配。
  • 可能的操作包括丢弃流量,将流量转移到特定队列、虚拟/物理设备或端口,执行隧道解封、添加标记等操作。

它比涵盖其功能的传统过滤框架层次更高,以便明确的行为暴露对所有P轮询模式驱动程序来讲相同的单个操作接口。

迁移现有应用程序的几种方法在 API migration 中有描述。

8.2. 流规则

8.2.1. 描述

流规则是具有匹配模式的属性和动作列表的组合。 流规则构成了此API的基础。

一个流规则可以具有几个不同的动作(如在将数据重定向到特定队列之前执行计数,封装,解封装等操作), 而不是依靠几个规则来实现这些动作,应用程序操作具体的硬件实现细节来顺序执行。

API提供了基于规则的不同优先级支持,例如,当报文匹配两个规则时,强制先执行特定规则。 然而,对于支持多个优先级的硬件,这一条不能保证。 当支持时,可用优先级的数量通常较低,这也是为什么还可以通过PMDs在软件中实现(如通过重新排序规则可以模拟缺失的优先级)。

为了尽可能保持与硬件无关,默认情况下所有规则都被认为具有相同的优先级,这意味着重叠规则(当数据包被多个过滤器匹配时)之间的顺序是未定义的。

PMD可以在可以被检测到的情况下(例如,如果模式匹配现有过滤器)拒绝在给定优先级下创建重叠规则。

因此,对于给定的优先级,可预测的结果只能通过非重叠规则来实现,在所有协议层上使用完全匹配。

流规则也可以分组,流规则优先级特定于它们所属的组。因此,给定组中的所有流规则在另一组之前或之后进行处理。

根据规则支持多个操作可以在非默认硬件优先级之前内部实现,因此两个功能可能不能同时应用于应用程序。

考虑到允许的模式/动作组合不能提前知道,并且将导致不切实际地大量的暴露能力,提供了从当前设备配置状态验证给定规则的方法。

这样,在启动数据路径之前,应用程序可以检查在初始化时是否支持所需的规则类型。该方法可以随时使用,其唯一要求是应该存在规则所需的资源(例如,应首先配置目标RX队列)。

每个定义的规则与由PMD管理的不透明句柄相关联,应用程序负责维护它。这些句柄可用于查询和规则管理,例如检索计数器或其他数据并销毁它们。

为了避免PMD方面的资源泄漏,在释放相关资源(如队列和端口)之前,应用程序必须显式地销毁句柄。

以下小节覆盖如下内容:

  • 属性 (由 struct rte_flow_attr 表示): 流规则的属性,例如其方向(Ingress或Egress)和优先级。
  • 模式条目 (由 struct rte_flow_item 表示): 匹配模式的一部分,匹配特定的数据包数据或流量属性。也可以描述模式本身属性,如反向匹配。
  • 匹配条目: 查找的属性,组合任意的模式。
  • 动作 (由 struct rte_flow_action 表示): 每当数据包被模式匹配时执行的操作。

8.2.2. 属性

8.2.2.1. 属性: 组

流规则可以通过为其分配一个公共的组号来分组。较低的值具有较高的优先级。组0具有最高优先级。

虽然是可选的,但是建议应用程序尽可能将类似的规则分组,以充分利用硬件功能(例如,优化的匹配)并解决限制(例如,给定组中可能允许的单个模式类型)。

请注意,并不保证支持多个组。

8.2.2.2. 属性: 优先级

可以将优先级分配给流规则。像Group一样,较低的值表示较高的优先级,0为最大值。

具有优先级0的Group 8流规则,总是在Group 0优先级8的优先级之后才匹配。

组和优先级是任意的,取决于应用程序,它们不需要是连续的,也不需要从0开始,但是最大数量因设备而异,并且可能受到现有流规则的影响。

如果某个报文在给定的优先级和Group中被几个规则匹配,那么结果是未定义的。 它可以采取任何路径,可能重复,甚至导致不可恢复的错误。

请注意,不保证能支持超过一个优先级。

8.2.2.3. 属性: 流量方向

流量规则可以应用于入站和/或出站流量(Ingress/Egress)。

多个模式条目和操作都是有效的,可以在个方向中使用。但是必须至少指定一个方向。

不推荐对给定规则一次指定两个方向,但在少数情况下可能是有效的(例如共享计数器)。

8.2.3. 模式条目

模式条目分成两类:

  • 匹配协议头部及报文数据(ANY,RAW,ETH,VLAN,IPV4,IPV6,ICMP,UDP,TCP,SCTP,,VXLAN,MPLS,GRE等等)。
  • 匹配元数据或影响模式处理(END,VOID,INVERT,PF,VF,PORT等等)。

条目规范结构用于匹配协议字段(或项目属性)中的特定值。文档描述每个条目是否与一个条目及其类型名称相关联。

可以为给定的条目最多设置三个相同类型的结构:

  • spec: 要匹配的数值(如IPv4地址)。
  • last: 规格中的相应字段的范围上限。
  • mask: 应用于spec和last的位掩码(如匹配IPv4地址的前缀)。

使用限制和期望行为:

  • 没有 spec 就设置 masklast 是错误的。
  • 错误的 last 值如0或者等于 spec 将被忽略,他们不能产生范围。不支持低于 spec 的非0值。
  • 设置 spce 和可选的 last ,而不设置 mask 会导致PMD使用该条目定义的默认`` mask``(定义为 rte_flow_item_{name}_mask 常量)。不设置他们相当于提供空掩码匹配。
  • 不设置他们相当于提供空掩码匹配。
  • 掩码是用于 speclast 的简单位掩码,如果不小心使用,可能会产生意想不到的结果。例如,对于IPv4地址字段,spec提供10.1.2.3,last提供10.3.4.5,掩码为255.255.0.0,有效范围为10.1.0.0~10.3.255.255。

匹配以太网头部的条目示例:

Table 8.1 Ethernet item
Field Subfield Value
spec src 00:01:02:03:04
dst 00:2a:66:00:01
type 0x22aa
last unspecified
mask src 00:ff:ff:ff:00
dst 00:00:00:00:ff
type 0x0000

无掩码的位可以匹配任意的值(显示为 ? ), 以太头部具有如下的属性匹配信息:

  • src: ??:01:02:03:??
  • dst: ??:??:??:??:01
  • type: 0x????

8.2.4. 匹配模式

模式是指通过堆叠从底层协议开始匹配条目。这种堆叠限制不适用于可以放在任意位置而不影响其结果的元条目。

模式由最后的条目终结。

例子:

Table 8.2 TCPv4 as L4
Index Item
0 Ethernet
1 IPv4
2 TCP
3 END

Table 8.3 TCPv6 in VXLAN
Index Item
0 Ethernet
1 IPv4
2 UDP
3 VXLAN
4 Ethernet
5 IPv6
6 TCP
7 END

Table 8.4 TCPv4 as L4 with meta items
Index Item
0 VOID
1 Ethernet
2 VOID
3 IPv4
4 TCP
5 VOID
6 VOID
7 END

上面的例子显示了一个元条目,如何实现不影响报文数据匹配结果,只要他们保持堆叠正确。结果匹配与 “TCPv4 as L4” 条目相同。

Table 8.5 UDPv6 anywhere
Index Item
0 IPv6
1 UDP
2 END

如果PMD支持,如上述示例(缺少以太网规范),忽略堆栈底部的一个或多个协议层,可以查找数据包中的任何位置。

无论支持的封装(例如VXLAN有效载荷)是否通过模式匹配, It is unspecified whether the payload of supported encapsulations (e.g. VXLAN payload) is matched by such a pattern, which may apply to inner, outer or both packets.

Table 8.6 Invalid, missing L3
Index Item
0 Ethernet
1 UDP
2 END

The above pattern is invalid due to a missing L3 specification between L2 (Ethernet) and L4 (UDP). Doing so is only allowed at the bottom and at the top of the stack.

8.2.5. Meta item types

They match meta-data or affect pattern processing instead of matching packet data directly, most of them do not need a specification structure. This particularity allows them to be specified anywhere in the stack without causing any side effect.

8.2.5.1. Item: END

End marker for item lists. Prevents further processing of items, thereby ending the pattern.

  • Its numeric value is 0 for convenience.
  • PMD support is mandatory.
  • spec, last and mask are ignored.
Table 8.7 END
Field Value
spec ignored
last ignored
mask ignored

8.2.5.2. Item: VOID

Used as a placeholder for convenience. It is ignored and simply discarded by PMDs.

  • PMD support is mandatory.
  • spec, last and mask are ignored.
Table 8.8 VOID
Field Value
spec ignored
last ignored
mask ignored

One usage example for this type is generating rules that share a common prefix quickly without reallocating memory, only by updating item types:

Table 8.9 TCP, UDP or ICMP as L4
Index Item
0 Ethernet
1 IPv4
2 UDP VOID VOID
3 VOID TCP VOID
4 VOID VOID ICMP
5 END

8.2.5.3. Item: INVERT

Inverted matching, i.e. process packets that do not match the pattern.

  • spec, last and mask are ignored.
Table 8.10 INVERT
Field Value
spec ignored
last ignored
mask ignored

Usage example, matching non-TCPv4 packets only:

Table 8.11 Anything but TCPv4
Index Item
0 INVERT
1 Ethernet
2 IPv4
3 TCP
4 END

8.2.5.4. Item: PF

Matches packets addressed to the physical function of the device.

If the underlying device function differs from the one that would normally receive the matched traffic, specifying this item prevents it from reaching that device unless the flow rule contains a Action: PF. Packets are not duplicated between device instances by default.

  • Likely to return an error or never match any traffic if applied to a VF device.
  • Can be combined with any number of Item: VF to match both PF and VF traffic.
  • spec, last and mask must not be set.
Table 8.12 PF
Field Value
spec unset
last unset
mask unset

8.2.5.5. Item: VF

Matches packets addressed to a virtual function ID of the device.

If the underlying device function differs from the one that would normally receive the matched traffic, specifying this item prevents it from reaching that device unless the flow rule contains a Action: VF. Packets are not duplicated between device instances by default.

  • Likely to return an error or never match any traffic if this causes a VF device to match traffic addressed to a different VF.
  • Can be specified multiple times to match traffic addressed to several VF IDs.
  • Can be combined with a PF item to match both PF and VF traffic.
  • Default mask matches any VF ID.
Table 8.13 VF
Field Subfield Value
spec id destination VF ID
last id upper range value
mask id zeroed to match any VF ID

8.2.5.6. Item: PORT

Matches packets coming from the specified physical port of the underlying device.

The first PORT item overrides the physical port normally associated with the specified DPDK input port (port_id). This item can be provided several times to match additional physical ports.

Note that physical ports are not necessarily tied to DPDK input ports (port_id) when those are not under DPDK control. Possible values are specific to each device, they are not necessarily indexed from zero and may not be contiguous.

As a device property, the list of allowed values as well as the value associated with a port_id should be retrieved by other means.

  • Default mask matches any port index.
Table 8.14 PORT
Field Subfield Value
spec index physical port index
last index upper range value
mask index zeroed to match any port index

8.2.6. Data matching item types

Most of these are basically protocol header definitions with associated bit-masks. They must be specified (stacked) from lowest to highest protocol layer to form a matching pattern.

The following list is not exhaustive, new protocols will be added in the future.

8.2.6.1. Item: ANY

Matches any protocol in place of the current layer, a single ANY may also stand for several protocol layers.

This is usually specified as the first pattern item when looking for a protocol anywhere in a packet.

  • Default mask stands for any number of layers.
Table 8.15 ANY
Field Subfield Value
spec num number of layers covered
last num upper range value
mask num zeroed to cover any number of layers

Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6) and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4 or IPv6) matched by the second ANY specification:

Table 8.16 TCP in VXLAN with wildcards
Index Item Field Subfield Value
0 Ethernet
1 ANY spec num 2
2 VXLAN
3 Ethernet
4 ANY spec num 1
5 TCP
6 END

8.2.6.2. Item: RAW

Matches a byte string of a given length at a given offset.

Offset is either absolute (using the start of the packet) or relative to the end of the previous matched item in the stack, in which case negative values are allowed.

If search is enabled, offset is used as the starting point. The search area can be delimited by setting limit to a nonzero value, which is the maximum number of bytes after offset where the pattern may start.

Matching a zero-length pattern is allowed, doing so resets the relative offset for subsequent items.

  • This type does not support ranges (last field).
  • Default mask matches all fields exactly.
Table 8.17 RAW
Field Subfield Value
spec relative look for pattern after the previous item
search search pattern from offset (see also limit)
reserved reserved, must be set to zero
offset absolute or relative offset for pattern
limit search area limit for start of pattern
length pattern length
pattern byte string to look for
last if specified, either all 0 or with the same values as spec
mask bit-mask applied to spec values with usual behavior

Example pattern looking for several strings at various offsets of a UDP payload, using combined RAW items:

Table 8.18 UDP payload matching
Index Item Field Subfield Value
0 Ethernet
1 IPv4
2 UDP
3 RAW spec relative 1
search 1
offset 10
limit 0
length 3
pattern “foo”
4 RAW spec relative 1
search 0
offset 20
limit 0
length 3
pattern “bar”
5 RAW spec relative 1
search 0
offset -29
limit 0
length 3
pattern “baz”
6 END

This translates to:

  • Locate “foo” at least 10 bytes deep inside UDP payload.
  • Locate “bar” after “foo” plus 20 bytes.
  • Locate “baz” after “bar” minus 29 bytes.

Such a packet may be represented as follows (not to scale):

0                     >= 10 B           == 20 B
|                  |<--------->|     |<--------->|
|                  |           |     |           |
|-----|------|-----|-----|-----|-----|-----------|-----|------|
| ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
|-----|------|-----|-----|-----|-----|-----------|-----|------|
                         |                             |
                         |<--------------------------->|
                                     == 29 B

Note that matching subsequent pattern items would resume after “baz”, not “bar” since matching is always performed after the previous item of the stack.

8.2.6.3. Item: ETH

Matches an Ethernet header.

  • dst: destination MAC.
  • src: source MAC.
  • type: EtherType.
  • Default mask matches destination and source addresses only.

8.2.6.4. Item: VLAN

Matches an 802.1Q/ad VLAN tag.

  • tpid: tag protocol identifier.
  • tci: tag control information.
  • Default mask matches TCI only.

8.2.6.5. Item: IPV4

Matches an IPv4 header.

Note: IPv4 options are handled by dedicated pattern items.

  • hdr: IPv4 header definition (rte_ip.h).
  • Default mask matches source and destination addresses only.

8.2.6.6. Item: IPV6

Matches an IPv6 header.

Note: IPv6 options are handled by dedicated pattern items.

  • hdr: IPv6 header definition (rte_ip.h).
  • Default mask matches source and destination addresses only.

8.2.6.7. Item: ICMP

Matches an ICMP header.

  • hdr: ICMP header definition (rte_icmp.h).
  • Default mask matches ICMP type and code only.

8.2.6.8. Item: UDP

Matches a UDP header.

  • hdr: UDP header definition (rte_udp.h).
  • Default mask matches source and destination ports only.

8.2.6.9. Item: TCP

Matches a TCP header.

  • hdr: TCP header definition (rte_tcp.h).
  • Default mask matches source and destination ports only.

8.2.6.10. Item: SCTP

Matches a SCTP header.

  • hdr: SCTP header definition (rte_sctp.h).
  • Default mask matches source and destination ports only.

8.2.6.11. Item: VXLAN

Matches a VXLAN header (RFC 7348).

  • flags: normally 0x08 (I flag).
  • rsvd0: reserved, normally 0x000000.
  • vni: VXLAN network identifier.
  • rsvd1: reserved, normally 0x00.
  • Default mask matches VNI only.

8.2.6.12. Item: MPLS

Matches a MPLS header.

  • label_tc_s_ttl: label, TC, Bottom of Stack and TTL.
  • Default mask matches label only.

8.2.6.13. Item: GRE

Matches a GRE header.

  • c_rsvd0_ver: checksum, reserved 0 and version.
  • protocol: protocol type.
  • Default mask matches protocol only.

8.2.7. Actions

Each possible action is represented by a type. Some have associated configuration structures. Several actions combined in a list can be affected to a flow rule. That list is not ordered.

They fall in three categories:

  • Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent processing matched packets by subsequent flow rules, unless overridden with PASSTHRU.
  • Non-terminating actions (PASSTHRU, DUP) that leave matched packets up for additional processing by subsequent flow rules.
  • Other non-terminating meta actions that do not affect the fate of packets (END, VOID, MARK, FLAG, COUNT).

When several actions are combined in a flow rule, they should all have different types (e.g. dropping a packet twice is not possible).

Only the last action of a given type is taken into account. PMDs still perform error checking on the entire list.

Like matching patterns, action lists are terminated by END items.

Note that PASSTHRU is the only action able to override a terminating rule.

Example of action that redirects packets to queue index 10:

Table 8.19 Queue action
Field Value
index 10

Action lists examples, their order is not significant, applications must consider all actions to be performed simultaneously:

Table 8.20 Count and drop
Index Action
0 COUNT
1 DROP
2 END

Table 8.21 Mark, count and redirect
Index Action Field Value
0 MARK mark 0x2a
1 COUNT
2 QUEUE queue 10
3 END

Table 8.22 Redirect to queue 5
Index Action Field Value
0 DROP
1 QUEUE queue 5
2 END

In the above example, considering both actions are performed simultaneously, the end result is that only QUEUE has any effect.

Table 8.23 Redirect to queue 3
Index Action Field Value
0 QUEUE queue 5
1 VOID
2 QUEUE queue 3
3 END

As previously described, only the last action of a given type found in the list is taken into account. The above example also shows that VOID is ignored.

8.2.8. Action types

Common action types are described in this section. Like pattern item types, this list is not exhaustive as new actions will be added in the future.

8.2.8.1. Action: END

End marker for action lists. Prevents further processing of actions, thereby ending the list.

  • Its numeric value is 0 for convenience.
  • PMD support is mandatory.
  • No configurable properties.
Table 8.24 END
Field
no properties

8.2.8.2. Action: VOID

Used as a placeholder for convenience. It is ignored and simply discarded by PMDs.

  • PMD support is mandatory.
  • No configurable properties.
Table 8.25 VOID
Field
no properties

8.2.8.3. Action: PASSTHRU

Leaves packets up for additional processing by subsequent flow rules. This is the default when a rule does not contain a terminating action, but can be specified to force a rule to become non-terminating.

  • No configurable properties.
Table 8.26 PASSTHRU
Field
no properties

Example to copy a packet to a queue and continue processing by subsequent flow rules:

Table 8.27 Copy to queue 8
Index Action Field Value
0 PASSTHRU
1 QUEUE queue 8
2 END

8.2.8.4. Action: MARK

Attaches an integer value to packets and sets PKT_RX_FDIR and PKT_RX_FDIR_ID mbuf flags.

This value is arbitrary and application-defined. Maximum allowed value depends on the underlying implementation. It is returned in the hash.fdir.hi mbuf field.

Table 8.28 MARK
Field Value
id integer value to return with packets

8.2.8.5. Action: FLAG

Flags packets. Similar to Action: MARK without a specific value; only sets the PKT_RX_FDIR mbuf flag.

  • No configurable properties.
Table 8.29 FLAG
Field
no properties

8.2.8.6. Action: QUEUE

Assigns packets to a given queue index.

  • Terminating by default.
Table 8.30 QUEUE
Field Value
index queue index to use

8.2.8.7. Action: DROP

Drop packets.

  • No configurable properties.
  • Terminating by default.
  • PASSTHRU overrides this action if both are specified.
Table 8.31 DROP
Field
no properties

8.2.8.8. Action: COUNT

Enables counters for this rule.

These counters can be retrieved and reset through rte_flow_query(), see struct rte_flow_query_count.

  • Counters can be retrieved with rte_flow_query().
  • No configurable properties.
Table 8.32 COUNT
Field
no properties

Query structure to retrieve and reset flow rule counters:

Table 8.33 COUNT query
Field I/O Value
reset in reset counter after query
hits_set out hits field is set
bytes_set out bytes field is set
hits out number of hits for this rule
bytes out number of bytes through this rule

8.2.8.9. Action: DUP

Duplicates packets to a given queue index.

This is normally combined with QUEUE, however when used alone, it is actually similar to QUEUE + PASSTHRU.

  • Non-terminating by default.
Table 8.34 DUP
Field Value
index queue index to duplicate packet to

8.2.8.10. Action: RSS

Similar to QUEUE, except RSS is additionally performed on packets to spread them among several queues according to the provided parameters.

Note: RSS hash result is stored in the hash.rss mbuf field which overlaps hash.fdir.lo. Since Action: MARK sets the hash.fdir.hi field only, both can be requested simultaneously.

  • Terminating by default.
Table 8.35 RSS
Field Value
rss_conf RSS parameters
num number of entries in queue[]
queue[] queue indices to use

8.2.8.11. Action: PF

Redirects packets to the physical function (PF) of the current device.

  • No configurable properties.
  • Terminating by default.
Table 8.36 PF
Field
no properties

8.2.8.12. Action: VF

Redirects packets to a virtual function (VF) of the current device.

Packets matched by a VF pattern item can be redirected to their original VF ID instead of the specified one. This parameter may not be available and is not guaranteed to work properly if the VF part is matched by a prior flow rule or if packets are not addressed to a VF in the first place.

  • Terminating by default.
Table 8.37 VF
Field Value
original use original VF ID if possible
vf VF ID to redirect packets to

8.2.9. Negative types

All specified pattern items (enum rte_flow_item_type) and actions (enum rte_flow_action_type) use positive identifiers.

The negative space is reserved for dynamic types generated by PMDs during run-time. PMDs may encounter them as a result but must not accept negative identifiers they are not aware of.

A method to generate them remains to be defined.

8.2.10. Planned types

Pattern item types will be added as new protocols are implemented.

Variable headers support through dedicated pattern items, for example in order to match specific IPv4 options and IPv6 extension headers would be stacked after IPv4/IPv6 items.

Other action types are planned but are not defined yet. These include the ability to alter packet data in several ways, such as performing encapsulation/decapsulation of tunnel headers.

8.3. Rules management

A rather simple API with few functions is provided to fully manage flow rules.

Each created flow rule is associated with an opaque, PMD-specific handle pointer. The application is responsible for keeping it until the rule is destroyed.

Flows rules are represented by struct rte_flow objects.

8.3.1. Validation

Given that expressing a definite set of device capabilities is not practical, a dedicated function is provided to check if a flow rule is supported and can be created.

int
rte_flow_validate(uint8_t port_id,
                  const struct rte_flow_attr *attr,
                  const struct rte_flow_item pattern[],
                  const struct rte_flow_action actions[],
                  struct rte_flow_error *error);

While this function has no effect on the target device, the flow rule is validated against its current configuration state and the returned value should be considered valid by the caller for that state only.

The returned value is guaranteed to remain valid only as long as no successful calls to rte_flow_create() or rte_flow_destroy() are made in the meantime and no device parameter affecting flow rules in any way are modified, due to possible collisions or resource limitations (although in such cases EINVAL should not be returned).

Arguments:

  • port_id: port identifier of Ethernet device.
  • attr: flow rule attributes.
  • pattern: pattern specification (list terminated by the END pattern item).
  • actions: associated actions (list terminated by the END action).
  • error: perform verbose error reporting if not NULL. PMDs initialize this structure in case of error only.

Return values:

  • 0 if flow rule is valid and can be created. A negative errno value otherwise (rte_errno is also set), the following errors are defined.
  • -ENOSYS: underlying device does not support this functionality.
  • -EINVAL: unknown or invalid rule specification.
  • -ENOTSUP: valid but unsupported rule specification (e.g. partial bit-masks are unsupported).
  • -EEXIST: collision with an existing rule.
  • -ENOMEM: not enough resources.
  • -EBUSY: action cannot be performed due to busy device resources, may succeed if the affected queues or even the entire port are in a stopped state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).

8.3.2. Creation

Creating a flow rule is similar to validating one, except the rule is actually created and a handle returned.

struct rte_flow *
rte_flow_create(uint8_t port_id,
                const struct rte_flow_attr *attr,
                const struct rte_flow_item pattern[],
                const struct rte_flow_action *actions[],
                struct rte_flow_error *error);

Arguments:

  • port_id: port identifier of Ethernet device.
  • attr: flow rule attributes.
  • pattern: pattern specification (list terminated by the END pattern item).
  • actions: associated actions (list terminated by the END action).
  • error: perform verbose error reporting if not NULL. PMDs initialize this structure in case of error only.

Return values:

A valid handle in case of success, NULL otherwise and rte_errno is set to the positive version of one of the error codes defined for rte_flow_validate().

8.3.3. Destruction

Flow rules destruction is not automatic, and a queue or a port should not be released if any are still attached to them. Applications must take care of performing this step before releasing resources.

int
rte_flow_destroy(uint8_t port_id,
                 struct rte_flow *flow,
                 struct rte_flow_error *error);

Failure to destroy a flow rule handle may occur when other flow rules depend on it, and destroying it would result in an inconsistent state.

This function is only guaranteed to succeed if handles are destroyed in reverse order of their creation.

Arguments:

  • port_id: port identifier of Ethernet device.
  • flow: flow rule handle to destroy.
  • error: perform verbose error reporting if not NULL. PMDs initialize this structure in case of error only.

Return values:

  • 0 on success, a negative errno value otherwise and rte_errno is set.

8.3.4. Flush

Convenience function to destroy all flow rule handles associated with a port. They are released as with successive calls to rte_flow_destroy().

int
rte_flow_flush(uint8_t port_id,
               struct rte_flow_error *error);

In the unlikely event of failure, handles are still considered destroyed and no longer valid but the port must be assumed to be in an inconsistent state.

Arguments:

  • port_id: port identifier of Ethernet device.
  • error: perform verbose error reporting if not NULL. PMDs initialize this structure in case of error only.

Return values:

  • 0 on success, a negative errno value otherwise and rte_errno is set.

8.3.5. Query

Query an existing flow rule.

This function allows retrieving flow-specific data such as counters. Data is gathered by special actions which must be present in the flow rule definition.

int
rte_flow_query(uint8_t port_id,
               struct rte_flow *flow,
               enum rte_flow_action_type action,
               void *data,
               struct rte_flow_error *error);

Arguments:

  • port_id: port identifier of Ethernet device.
  • flow: flow rule handle to query.
  • action: action type to query.
  • data: pointer to storage for the associated query data type.
  • error: perform verbose error reporting if not NULL. PMDs initialize this structure in case of error only.

Return values:

  • 0 on success, a negative errno value otherwise and rte_errno is set.

8.4. Verbose error reporting

The defined errno values may not be accurate enough for users or application developers who want to investigate issues related to flow rules management. A dedicated error object is defined for this purpose:

enum rte_flow_error_type {
    RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
    RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
    RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
    RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
    RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
    RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
    RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
    RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
    RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
    RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
    RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
    RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
};

struct rte_flow_error {
    enum rte_flow_error_type type; /**< Cause field and error types. */
    const void *cause; /**< Object responsible for the error. */
    const char *message; /**< Human-readable error message. */
};

Error type RTE_FLOW_ERROR_TYPE_NONE stands for no error, in which case remaining fields can be ignored. Other error types describe the type of the object pointed by cause.

If non-NULL, cause points to the object responsible for the error. For a flow rule, this may be a pattern item or an individual action.

If non-NULL, message provides a human-readable error message.

This object is normally allocated by applications and set by PMDs in case of error, the message points to a constant string which does not need to be freed by the application, however its pointer can be considered valid only as long as its associated DPDK port remains configured. Closing the underlying device or unloading the PMD invalidates it.

8.5. Caveats

  • DPDK does not keep track of flow rules definitions or flow rule objects automatically. Applications may keep track of the former and must keep track of the latter. PMDs may also do it for internal needs, however this must not be relied on by applications.
  • Flow rules are not maintained between successive port initializations. An application exiting without releasing them and restarting must re-create them from scratch.
  • API operations are synchronous and blocking (EAGAIN cannot be returned).
  • There is no provision for reentrancy/multi-thread safety, although nothing should prevent different devices from being configured at the same time. PMDs may protect their control path functions accordingly.
  • Stopping the data path (TX/RX) should not be necessary when managing flow rules. If this cannot be achieved naturally or with workarounds (such as temporarily replacing the burst function pointers), an appropriate error code must be returned (EBUSY).
  • PMDs, not applications, are responsible for maintaining flow rules configuration when stopping and restarting a port or performing other actions which may affect them. They can only be destroyed explicitly by applications.

For devices exposing multiple ports sharing global settings affected by flow rules:

  • All ports under DPDK control must behave consistently, PMDs are responsible for making sure that existing flow rules on a port are not affected by other ports.
  • Ports not under DPDK control (unaffected or handled by other applications) are user’s responsibility. They may affect existing flow rules and cause undefined behavior. PMDs aware of this may prevent flow rules creation altogether in such cases.

8.6. PMD interface

The PMD interface is defined in rte_flow_driver.h. It is not subject to API/ABI versioning constraints as it is not exposed to applications and may evolve independently.

It is currently implemented on top of the legacy filtering framework through filter type RTE_ETH_FILTER_GENERIC that accepts the single operation RTE_ETH_FILTER_GET to return PMD-specific rte_flow callbacks wrapped inside struct rte_flow_ops.

This overhead is temporarily necessary in order to keep compatibility with the legacy filtering framework, which should eventually disappear.

  • PMD callbacks implement exactly the interface described in Rules management, except for the port ID argument which has already been converted to a pointer to the underlying struct rte_eth_dev.
  • Public API functions do not process flow rules definitions at all before calling PMD functions (no basic error checking, no validation whatsoever). They only make sure these callbacks are non-NULL or return the ENOSYS (function not supported) error.

This interface additionally defines the following helper functions:

  • rte_flow_ops_get(): get generic flow operations structure from a port.
  • rte_flow_error_set(): initialize generic flow error structure.

More will be added over time.

8.7. Device compatibility

No known implementation supports all the described features.

Unsupported features or combinations are not expected to be fully emulated in software by PMDs for performance reasons. Partially supported features may be completed in software as long as hardware performs most of the work (such as queue redirection and packet recognition).

However PMDs are expected to do their best to satisfy application requests by working around hardware limitations as long as doing so does not affect the behavior of existing flow rules.

The following sections provide a few examples of such cases and describe how PMDs should handle them, they are based on limitations built into the previous APIs.

8.7.1. Global bit-masks

Each flow rule comes with its own, per-layer bit-masks, while hardware may support only a single, device-wide bit-mask for a given layer type, so that two IPv4 rules cannot use different bit-masks.

The expected behavior in this case is that PMDs automatically configure global bit-masks according to the needs of the first flow rule created.

Subsequent rules are allowed only if their bit-masks match those, the EEXIST error code should be returned otherwise.

8.7.2. Unsupported layer types

Many protocols can be simulated by crafting patterns with the Item: RAW type.

PMDs can rely on this capability to simulate support for protocols with headers not directly recognized by hardware.

8.7.3. ANY pattern item

This pattern item stands for anything, which can be difficult to translate to something hardware would understand, particularly if followed by more specific types.

Consider the following pattern:

Table 8.38 Pattern with ANY as L3
Index Item
0 ETHER
1 ANY num 1
2 TCP
3 END

Knowing that TCP does not make sense with something other than IPv4 and IPv6 as L3, such a pattern may be translated to two flow rules instead:

Table 8.39 ANY replaced with IPV4
Index Item
0 ETHER
1 IPV4 (zeroed mask)
2 TCP
3 END

Table 8.40 ANY replaced with IPV6
Index Item
0 ETHER
1 IPV6 (zeroed mask)
2 TCP
3 END

Note that as soon as a ANY rule covers several layers, this approach may yield a large number of hidden flow rules. It is thus suggested to only support the most common scenarios (anything as L2 and/or L3).

8.7.4. Unsupported actions

8.7.5. Flow rules priority

While it would naturally make sense, flow rules cannot be assumed to be processed by hardware in the same order as their creation for several reasons:

  • They may be managed internally as a tree or a hash table instead of a list.
  • Removing a flow rule before adding another one can either put the new rule at the end of the list or reuse a freed entry.
  • Duplication may occur when packets are matched by several rules.

For overlapping rules (particularly in order to use Action: PASSTHRU) predictable behavior is only guaranteed by using different priority levels.

Priority levels are not necessarily implemented in hardware, or may be severely limited (e.g. a single priority bit).

For these reasons, priority levels may be implemented purely in software by PMDs.

  • For devices expecting flow rules to be added in the correct order, PMDs may destroy and re-create existing rules after adding a new one with a higher priority.
  • A configurable number of dummy or empty rules can be created at initialization time to save high priority slots for later.
  • In order to save priority levels, PMDs may evaluate whether rules are likely to collide and adjust their priority accordingly.

8.8. Future evolutions

  • A device profile selection function which could be used to force a permanent profile instead of relying on its automatic configuration based on existing flow rules.
  • A method to optimize rte_flow rules with specific pattern items and action types generated on the fly by PMDs. DPDK should assign negative numbers to these in order to not collide with the existing types. See Negative types.
  • Adding specific egress pattern items and actions as described in `Attribute: Traffic direction`_.
  • Optional software fallback when PMDs are unable to handle requested flow rules so applications do not have to implement their own.

8.9. API migration

Exhaustive list of deprecated filter types (normally prefixed with RTE_ETH_FILTER_) found in rte_eth_ctrl.h and methods to convert them to rte_flow rules.

8.9.1. MACVLAN to ETHVF, PF

MACVLAN can be translated to a basic Item: ETH flow rule with a terminating Action: VF or Action: PF.

Table 8.41 MACVLAN conversion
Pattern Actions
0 ETH spec any VF, PF
last N/A
mask any
1 END END

8.9.2. ETHERTYPE to ETHQUEUE, DROP

ETHERTYPE is basically an Item: ETH flow rule with a terminating Action: QUEUE or Action: DROP.

Table 8.42 ETHERTYPE conversion
Pattern Actions
0 ETH spec any QUEUE, DROP
last N/A
mask any
1 END END

8.9.3. FLEXIBLE to RAWQUEUE

FLEXIBLE can be translated to one Item: RAW pattern with a terminating Action: QUEUE and a defined priority level.

Table 8.43 FLEXIBLE conversion
Pattern Actions
0 RAW spec any QUEUE
last N/A
mask any
1 END END

8.9.4. SYN to TCPQUEUE

SYN is a Item: TCP rule with only the syn bit enabled and masked, and a terminating Action: QUEUE.

Priority level can be set to simulate the high priority bit.

Table 8.44 SYN conversion
Pattern Actions
0 ETH spec unset QUEUE
last unset
mask unset
1 IPV4 spec unset END
mask unset
mask unset
2 TCP spec syn 1
mask syn 1
3 END

8.9.5. NTUPLE to IPV4, TCP, UDPQUEUE

NTUPLE is similar to specifying an empty L2, Item: IPV4 as L3 with Item: TCP or Item: UDP as L4 and a terminating Action: QUEUE.

A priority level can be specified as well.

Table 8.45 NTUPLE conversion
Pattern Actions
0 ETH spec unset QUEUE
last unset
mask unset
1 IPV4 spec any END
last unset
mask any
2 TCP, UDP spec any
last unset
mask any
3 END

8.9.6. TUNNEL to ETH, IPV4, IPV6, VXLAN (or other) → QUEUE

TUNNEL matches common IPv4 and IPv6 L3/L4-based tunnel types.

In the following table, Item: ANY is used to cover the optional L4.

Table 8.46 TUNNEL conversion
Pattern Actions
0 ETH spec any QUEUE
last unset
mask any
1 IPV4, IPV6 spec any END
last unset
mask any
2 ANY spec any
last unset
mask num 0
3 VXLAN, GENEVE, TEREDO, NVGRE, GRE, ... spec any
last unset
mask any
4 END

8.9.7. FDIR to most item types → QUEUE, DROP, PASSTHRU

FDIR is more complex than any other type, there are several methods to emulate its functionality. It is summarized for the most part in the table below.

A few features are intentionally not supported:

  • The ability to configure the matching input set and masks for the entire device, PMDs should take care of it automatically according to the requested flow rules.

    For example if a device supports only one bit-mask per protocol type, source/address IPv4 bit-masks can be made immutable by the first created rule. Subsequent IPv4 or TCPv4 rules can only be created if they are compatible.

    Note that only protocol bit-masks affected by existing flow rules are immutable, others can be changed later. They become mutable again after the related flow rules are destroyed.

  • Returning four or eight bytes of matched data when using flex bytes filtering. Although a specific action could implement it, it conflicts with the much more useful 32 bits tagging on devices that support it.

  • Side effects on RSS processing of the entire device. Flow rules that conflict with the current device configuration should not be allowed. Similarly, device configuration should not be allowed when it affects existing flow rules.

  • Device modes of operation. “none” is unsupported since filtering cannot be disabled as long as a flow rule is present.

  • “MAC VLAN” or “tunnel” perfect matching modes should be automatically set according to the created flow rules.

  • Signature mode of operation is not defined but could be handled through a specific item type if needed.

Table 8.47 FDIR conversion
Pattern Actions
0 ETH, RAW spec any QUEUE, DROP, PASSTHRU
last N/A
mask any
1 IPV4, IPv6 spec any MARK
last N/A
mask any
2 TCP, UDP, SCTP spec any END
last N/A
mask any
3 VF, PF (optional) spec any
last N/A
mask any
4 END

8.9.8. HASH

There is no counterpart to this filter type because it translates to a global device setting instead of a pattern item. Device settings are automatically set according to the created flow rules.

8.9.9. L2_TUNNEL to VOIDVXLAN (or others)

All packets are matched. This type alters incoming packets to encapsulate them in a chosen tunnel type, optionally redirect them to a VF as well.

The destination pool for tag based forwarding can be emulated with other flow rules using Action: DUP.

Table 8.48 L2_TUNNEL conversion
Pattern Actions
0 VOID spec N/A VXLAN, GENEVE, ...
last N/A
mask N/A
1 END VF (optional)
2 END