Bit error counter - how to make it faster

Discussion in 'Cadence' started by gamer, Jun 27, 2007.

  1. gamer

    gamer Guest

    My goal is to implement a bit-error counter targeting 1GHz. The
    datawidth is parametrizable. I started off this way,

    Verilog code:
    ----------
    assign mismatch[datawidth-1:0] = input_data ^ expected_data;
    assign matched = ~( | mismatch); // reduction NOR

    always @(posedge clk or posedge reset) begin
    if (reset)
    error_count = 0;
    else if (~matched)
    for (i=0; i<datawidth; i=i+1)
    error_count = error_count + mismatch;
    end
    ---------/////////----------------------
    The above meets timing for small datawidths (like 8-bits) and starts
    failing to meet timing once the datawidth gets larger. I will be glad
    for suggestions of alternate ways to implement this bit-error counter.
    In practice our datawidths could go upto 256 bits.

    Thanks
     
    gamer, Jun 27, 2007
    #1
  2. The above meets timing for small datawidths (like 8-bits) and starts
    Spread your algorithm to work over more than one clock cycle. Use a pipeline
    stage over eigth clocks. The first stage calculates bit 0 to 7, the second
    calculates 8 to 15 and so on. In an additional stage you sum up all results.
    The result is always 9 clocks behind.

    See http://en.wikipedia.org/wiki/Pipeline_(computer) for a general
    explanation of pipelines.

    hth
    Günther Jehle
     
    Günther Jehle, Jun 27, 2007
    #2


  3. I know no verilog and am not even a C programmer, but ISTM
    that the summation could be highly parallel by using a tree
    of adders (perhaps using carry save to further accelerate
    the summation). Consider it like computing the population
    count followed by an ordinary addition: one can add together
    the even and odd mismatch bits in parallel, then add pairs
    of these sums, etc., finally adding the sum to error_count.

    (BTW, if I understand correctly one should not need the
    "reduction NOR" and the "if (~matched)" since if the
    popcount is zero, adding it to error_count would effectively
    be a no-op, assuming worst-case speed is the only
    consideration. As Gunther Jehle pointed out pipelining
    could provide higher throughput.)


    Paul A. Clayton
    just a technophile
     
    Paul A. Clayton, Jun 27, 2007
    #3
  4. In Virtex-5 the 6-LUTs point to a 6-to-3 reduction instead of full
    adders.
    You can also use BRAMs for an 11 to 4 reduction.
    512 bits at 500MHz might be easier than 256 bits at 1GHz.


    Kolja Sulimma
     
    comp.arch.fpga, Jun 28, 2007
    #4
  5. gamer

    Andy Freeman Guest

    How is the input data created? If the register values come from bit
    set/clear operations, you can keep track of the number of ones with a
    separate counter that is updated appropriately when the register is
    updated.

    For example, suppose that you'd like to know the number of ones in a
    shift register. If the input bit and the bit shifted off (and thus
    removed from the register) are the same, the separate counter is
    unchanged. If the input is 1 and the bit shifted off is 0, the
    counter is incremented. If the input is 0 and the bit shifted off is
    1, the counter is decremented. When the register as a whole is
    cleared, the counter is also cleared.

    It's easy enough to extend this to any update scheme where the number
    of bits that can change each cycle is small, even if those can be in
    arbitrary positions.
     
    Andy Freeman, Jun 28, 2007
    #5
  6. gamer

    YUAN, Nan Guest

    I guess the critical path of timming is summation chain.
    There're serveral binary tricks to get number of bit "1" in a word,
    take C code of 32-bit word for example, there's only 5 summations
    instead of 32.

    unsigned int bitcount32(unsigned int x)
    {
    x = (x & 0x55555555UL) + ((x >> 1) & 0x55555555UL); // 0-2 in 2
    bits
    x = (x & 0x33333333UL) + ((x >> 2) & 0x33333333UL); // 0-4 in 4
    bits
    #if 1
    // Version 1
    x = (x & 0x0f0f0f0fUL) + ((x >> 4) & 0x0f0f0f0fUL); // 0-8 in 8
    bits
    x = (x & 0x00ff00ffUL) + ((x >> 8) & 0x00ff00ffUL); // 0-16 in 16
    bits
    x = (x & 0x0000ffffUL) + ((x >> 16) & 0x0000ffffUL); // 0-31 in 32
    bits
    return x;
    #else
    // Version 2
    x = (x + (x >> 4)) & 0x0f0f0f0fUL; // 0-8 in 4 bits
    x += x >> 8; // 0-16 in 8 bits
    x += x >> 16; // 0-32 in 8 bits
    return x & 0xff;
    #endif
    }
     
    YUAN, Nan, Jun 30, 2007
    #6
Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.