DOCA SDK Documentation

NVIDIA DOCA IP Fragmentation Application Guide

This document provides a IP Fragmentation implementation on top of the NVIDIA® BlueField® DPU.

Introduction

This IP Fragmentation application is designed to handle IP fragmentation and reassembly efficiently, ensuring minimal processing overhead for non-fragmented packets while maintaining high performance for fragmented packets.

The application operates on a multi-core architecture, uses Receive Side Scaling (RSS) to distribute traffic, and supports configurable modes for flexible port configurations.

Key Features:

  • IP Reassembly:

    • Functionality: The application assembles fragmented packets received on input ports based on their fragmentation headers.

    • Workflow: Upon successful reassembly, the complete packets are forwarded to their destination port.

  • IP Fragmentation:

    • Functionality: Packets exceeding a configurable Maximum Transmission Unit (MTU) are fragmented into smaller packets.

    • Workflow: Fragments are generated with correct headers and forwarded while maintaining efficient resource utilization.

  • Transparent Forwarding: Packets that are neither fragmented nor require reassembly are forwarded directly without additional processing overhead.

  • Inner and Outer Fragmentation Handling: The application supports handling fragmentation at both inner (e.g., encapsulated traffic like GRE, VXLAN) and outer IP layers.

  • Performance Optimization:

    • Designed for high throughput using multi-core processing.

    • Utilizes RSS to distribute traffic across multiple cores, ensuring efficient CPU utilization and scalability.

  • Debuggability with Counters.

  • Dual Operating Modes:

    • Mode 1 (Two Ports): Forwarding between two ports (e.g., Port A ↔ Port B).

    • Mode 2 (Four Ports): Forwarding between Port A and Port B and between Port C and Port D (e.g., Port A ↔ Port B, Port C ↔ Port D), enabling simultaneous independent operations on two traffic streams.

System Design

The IP Fragmentation application client can either runs on the DPU serving as an underlying service for host applications.

Supported Modes:

Dual Port Mode (Bidirectional): Traffic flows bidirectionally between two ports.

Quad Port Mode (Multiport): Independent unidirectional forwarding from Port A ↔ Port B and Port C ↔ Port D.

In this mode, the direction of the traffic is isolated to go through two ports each time.

Notes:

  1. Both diagrams illustrate the flow for a single direction; however, the application operates bidirectionally.

  2. In both modes, non-fragmented or valid-sized packets follow the same flow path without additional actions.

Application Architecture

The IP Fragmentation application runs on top of the DOCA API to send and receive packets.

Operational Workflow

  • Packet Reception and Classification:

    • Traffic is received on the input ports, with RSS distributing flows to available cores.

    • Packets are classified into three categories:

      • Fragmented (Needs Reassembly)

      • Too Large (Needs Fragmentation)

      • Standard Packets (Direct Forwarding)

  • Reassembly:

    • Fragments are buffered and reassembled using a configurable timeout.

    • Once reassembled, the full packet is validated and forwarded.

  • Fragmentation:

    • Large packets exceeding the MTU are fragmented.

    • Fragments are prepared with correct headers, sequence numbers, and size.

  • Direct Forwarding:

    • Standard packets are forwarded with minimal processing

Performance and Scalability

  • Multi-Core Processing:
    The application scales horizontally with the number of CPU cores, with each core handling a subset of traffic flows.

  • RSS Traffic Distribution:
    Receive Side Scaling ensures optimal load balancing across cores.

  • Minimal Overhead:
    Processing logic is optimized for low-latency handling of standard packets while ensuring efficient fragmentation and reassembly operations.

Debugging and Monitoring

Application provides real-time counters for insights for:

  • Packets processed.

  • Fragments reassembled or fragmented.

  • Errors such as timeout on incomplete fragments.

DOCA Libraries

This application leverages the following DOCA libraries:

For additional information about the used DOCA libraries, please refer to the respective programming guides.

Dependencies

  • NVIDIA BlueField-3 DPU is required.

  • Ubuntu 18.04/20.04/22.04 hosts (x86)

  • Open MPI version 4.1.5rc2 or greater (included in DOCA's installation)

Compiling the Application

Please refer to the NVIDIA DOCA Installation Guide for Linux for details on how to install BlueField-related software.

The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application.

For more information about the applications as well as development and compilation tips, refer to the DOCA Applications page.

The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/frag/.

Compiling All Applications

All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.

To build all the applications together, run:

cd /opt/mellanox/doca/applications/
meson /tmp/build 
ninja -C /tmp/build


doca_ip_frag is created under /tmp/build/ip_frag/.

Compiling Only the Current Application

  1. To directly build only the IP fragmentation application:

    cd /opt/mellanox/doca/applications/
    meson /tmp/build -Denable_all_applications=false -Denable_ip_frag=true
    ninja -C /tmp/build
    


    doca_ip_frag is created under /tmp/build/ip_frag/.


  2. Alternatively, one can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:

    1. Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:

      • Set enable_all_applications to false

      • Set enable_frag to true

    2. The same compilation commands should be used, as were shown in the previous section:

      cd /opt/mellanox/doca/applications/
      meson /tmp/build 
      ninja -C /tmp/build
      


      doca_ip_frag is created under /tmp/build/ip_frag/.

Troubleshooting

Please refer to the NVIDIA DOCA Troubleshooting Guide for any issue you may encounter with the compilation of the DOCA applications.

Running the Application

Prerequisites

  • The Fragmentation application is based on DOCA Flow. Therefore, the user is required to allocate huge pages. 

    $ echo '4096' | sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
    $ sudo mkdir /mnt/huge
    $ sudo mount -t hugetlbfs -o pagesize=2M nodev /mnt/huge
    


  • FLEX profile number should be manually set to 3 on the system to enable GTP matching:

Bash
$ sudo mlxconfig -d <pcie_address> s FLEX_PARSER_PROFILE_ENABLE=3

Application Execution

The Fragmentation application is provided in source form, hence a compilation is required before the application can be executed.

  1. Application usage instructions:

    Usage: doca_ip_frag [DPDK Flags] -- [DOCA Flags] [Program Flags]
    
    DOCA Flags:
      -h, --help                        Print a help synopsis
      -v, --version                     Print program version information
      -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      -j, --json <path>                 Parse all command flags from an input json file
    
    Program Flags:
      -m, --mode                        Ip_frag application mode. Bidirectional mode forwards packets between a single reassembly port and a single fragmentation port. Multiport mode forwards packets between two pairs of reassembly and fragmentation ports. For more information consult DOCA IP Fragmentation Application Guide. Format: bidir, multiport
      -u, --mtu                         MTU size
      -t, --frag-aging-timeout          Aging timeout of fragments pending packet reassembly in the fragmentation table (in ms)
      -s, --frag-tbl-size               Frag table size, i.e. maximum amount of concurrent defragmentation contexts per worker thread
      -c, --mbuf-chain                  Enable mbuf chaining
    
    
    

    For additional information, please refer to the Command Line Flags section below.

    The above usage printout can be printed to the command line using the -h (or --help) options:

    /tmp/build/ip_frag/doca_ip_frag -- -h
    



  2. CLI example for running the application on BlueField:

    /tmp/build/ip_frag/doca_ip_frag -a auxiliary:mlx5_core.sf.2,dv_flow_en=2,sft_en=1 -a auxiliary:mlx5_core.sf.4,dv_flow_en=2,sft_en=1  -a auxiliary:mlx5_core.sf.3,dv_flow_en=2,sft_en=1  -a auxiliary:mlx5_core.sf.5,dv_flow_en=2,sft_en=1  -l 3-15  --  -l 50 -m multiport
    


  3. CLI example for running the application on the host:

    /tmp/build/ip_frag/doca_ip_frag -l 0-7 -a 0000:08:00.0,dv_flow_en=2 -a 0000:08:00.1,dv_flow_en=2 -- -l 60 -m bidir -t 1000
    


    The DOCA Comm Channel device PCI addresses (0000:08:00.0, 0000:08:00.1) should match the address of the desired PCI device.


  4. The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file: 

    /tmp/build/ip_frag/doca_ip_frag --json [json_file]
    

    For example:

    /tmp/build/ip_frag/doca_ip_frag --json /opt/src/doca/applications/ip_frag/ip_frag_params.json
    


    Before execution, please ensure that the used JSON file contains the correct configuration parameters, and especially the desired PCI addresses needed for the deployment.

Command Line Flags

Flag Type

Short Flag

Long Flag/JSON Key

Description

JSON Content

DPDK flags

a

devices

Add devices to the allow list

This is a mandatory flag. 



"devices": [
	{
		"device": "pf",
		"id": "0000:08:00.0",
		"hws": true,
	},
	{
		"device": "pf",
		"id": "0000:08:00.1",
		"hws": true,
	},
],


l

core-list

List of cores to be used by the application data path

This is a mandatory flag. 



"core-list": "0-1"



General flags

h

help

Prints a help synopsis

N/A

v

version

Prints program version information

N/A

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (Requires compilation with Trace level support)


"log-level": 60


N/A

sdk-log-level

Sets the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70


"sdk-log-level": 40


j

json

Parse all command flags from an input json file

N/A

Program flags

m

mode

Execution mode: bidir, multiport

This is a mandatory flag. 



"mode": "bidir"


u

mtu

MTU for fragmentation


"mtu": 1518 


t

frag-aging-timeout

Fragmentation table aging timeout (in [ms])



"frag-aging-timeout": 2



s

frag-tbl-size

Fragmentation table size


"frag-tbl-size": 2048



c

mbuf-chain

Enable mbuf chaining support on packet reassembly


"mbuf-chain": false


Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.

Troubleshooting

Please refer to the NVIDIA DOCA Troubleshooting Guide for any issue you may encounter with the installation or execution of the DOCA applications.

Application Code Flow

  1. Parse application arguments. 

    1. Initialize arg parser resources, register DOCA general and DPDK-specific parameters.

      doca_argp_init();
      doca_argp_set_dpdk_program(dpdk_init);
      


    2. Register IP Fragmentation application parameters.

      ip_frag_register_params();
      


    3. Parse the arguments.

      doca_argp_start();
      
      1. Parse DPDK flags and invoke handler for calling the rte_eal_init() function.

      2. Parse app parameters.

    4. Application uses different amount of ports depending on the mode argument. Set config→nb_ports to all available DPDK ports obtained by calling rte_eth_dev_count_avail() function.

      ip_frag_dpdk_config_num_ports();
      


    5. Application uses a dedicated queue per-core and amount of data path cores is user-configurable with DPDK arguments. Initialize dpdk ports and queues with the DOCA helper function.

      dpdk_queues_and_ports_init();
      
      1. Initialize DPDK ports.

      2. Create mbuf pool using  rte_pktmbuf_pool_create.

      3. Driver initialization – use  rte_eth_dev_configure to configure the number of queues.

      4. Rx/Tx queue initialization – use rte_eth_rx_queue_setup and rte_eth_tx_queue_setup to initialize the queues.

      5. Start the port using  rte_eth_dev_start.

  2. In order to support graceful shutdown (including printing statistics and useful debug data) register a signal handler that sets force_stop variable to terminate data path cores main loop.

    signal(SIGINT, signal_handler);
    signal(SIGTERM, signal_handler);
    


  3. Call the function that implements all app-specific initialization.

    ip_frag();
    
    1. Initialize DOCA Flow that is necessary for RSS.

      init_doca_flow();
      


    2. Reserve a mbuf flag with rte_mbuf_dynflag_register() for saving fragmentation state.

      ip_frag_mbuf_flags_init();
      


    3. Create a per-core mempool for resulting packet fragment indirect mbufs using rte_pktmbuf_pool_create().

      ip_frag_indirect_pool_init();
      


    4. Create per-core data with rte_calloc(), initialize auxiliary data structures rte_eth_dev_tx_buffer with rte_zmalloc_socket(), rte_eth_tx_buffer_init(), rte_eth_tx_buffer_set_err_callback() and rte_ip_frag_tbl with rte_ip_frag_table_create().

      ip_frag_wt_data_init();
      


    5. Initialize DOCA Flow ports.

      ip_init_doca_flow_ports();
      


    6. Create RSS pipes and entries using Toeplitz hash function over outer IPv4 header fields.

      ip_frag_rss_pipes_create();
      


      1. Create DOCA Flow pipe config.

        doca_flow_pipe_cfg_create();
        set_flow_pipe_cfg();
        doca_flow_pipe_cfg_set_domain();
        doca_flow_pipe_cfg_set_nr_entries();
        doca_flow_pipe_cfg_set_match();
        


      2. Create the RSS pipe.

        doca_flow_pipe_create();
        


      3. Add RSS pipe entry.

        doca_flow_pipe_add_entry();
        


      4. Process the entry completion.

        doca_flow_entries_process();
        


    7. Start the data path main function on each worker thread.

      rte_eal_mp_remote_launch();
      


    8. Worker thread main loop function forwards packets between sets of ports, fragmenting or reassembling them on IP layer depending on the mode.

      ip_frag_wt_thread_main();
      


      1. Packet fragmentation algorithm entry-point function.

        ip_frag_wt_fragment();
        


        1. Receive packet burst from rx port.

          rte_eth_rx_burst();
          


        2. Iterate over burst of packets, fragment packets larger than MTU, push all resulting packets to tx buffer.

          ip_frag_pkt{s}_fragment();
          


          1. Parse the packet, store pointers to the parsed packet headers in frag_conn_parser_ctx instance.

            ip_frag_wan_parse();
            


          2. Save L2 header of a packet pending fragmentation into eth_hdr_copy and adjust mbuf data pointer to point to IP header.

            memcpy();
            rte_pktmbuf_adj();
            


          3. Fragment the packet.

            rte_ipv4_fragment_packet();
            


          4. Release the original packet mbuf.

            rte_pktmbuf_free();
            


          5. Fix IP header checksum of resulting fragments.

            ip_frag_ipv4_hdr_cksum();
            


          6. Prepend previously saved L2 header to the resulting fragments.

            rte_pktmbuf_prepend();
            memcpy();
            


          7. Push packet(s) to tx buffer.

            rte_eth_tx_buffer();
            


        3. Send resulting packet tx buffer to tx port.

          rte_eth_tx_buffer_flush();
          


      2. Packet reassembly algorithm entry-point function.

        ip_frag_wt_reassemble();
        


        1. Receive packet burst from rx port.

          rte_eth_rx_burst();
          


        2. Iterate over burst of packets, save fragments into frag table for reassembly, push all resulting packets to tx buffer.

          ip_frag_pkt{s}_reassemble();
          


          1. Parse the packet, store pointers to the parsed packet headers in frag_conn_parser_ctx instance.

            ip_frag_pkt_parse();
            


          2. Parsing result code DOCA_ERROR_AGAIN indicates that the parser has encountered a IP fragment and that re-parsing is required after reassembling the packet. Push the fragment to the frag table for reassembly.

            ip_frag_pkt_reassemble_push();
            


            1. Call the function that prepares the fragment for reassembly by setting all necessary mbuf fields.

              ip_frag_pkt_reassemble_prepare();
              


            2. Push the packet.

              rte_ipv4_frag_reassemble_packet();
              


            3. If mbuf chaining is disabled, then flatten the resulting mbuf chain into a single mbuf.

              ip_frag_pkt_fixup();
              


            4. Push packet(s) to tx buffer.

              rte_eth_tx_buffer();
              


          3. Fix the reassembled packet by re-computing  its IP checksums, setting UPD checksum to 0 and fixing all applicable 'length' fields.

            ip_frag_pkt_flatten();
            


        3. Put expired fragments from the fragmentation table into death row.

          rte_ip_frag_table_del_expired_entries();
          


        4. Free death row mbufs.

          rte_ip_frag_free_death_row();
          


        5. Send resulting packet tx buffer to tx port.

          rte_eth_tx_buffer_flush();
          


    9. Wait for worker threads to finish.

      rte_eal_mp_wait_lcore();
      


    10. Print statistics and debug data.

      ip_frag_debug_counters_print();
      


    11. Stop DOCA Flow ports.

      stop_doca_flow_ports();
      


    12. Cleanup per-core data.

      ip_frag_wt_data_cleanup();
      


    13. Destroy DOCA Flow.

      doca_flow_destroy();
      


  4. DPDK ports and queues destruction. 

    dpdk_queues_and_ports_fini();
    


  5. DPDK finish.


    dpdk_fini();
    


  6. Arg parser destroy.

    doca_argp_destroy()
    


References

  • /opt/mellanox/doca/applications/ip_frag/

  • /opt/mellanox/doca/applications/ip_frag/ip_frag_params.json

Last updated: