Raptor 3.0.0-rc.1
A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences
 
adjust_seed.hpp
Go to the documentation of this file.
1// --------------------------------------------------------------------------------------------------
2// Copyright (c) 2006-2023, Knut Reinert & Freie Universität Berlin
3// Copyright (c) 2016-2023, Knut Reinert & MPI für molekulare Genetik
4// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License
5// shipped with this file and also available at: https://github.com/seqan/raptor/blob/main/LICENSE.md
6// --------------------------------------------------------------------------------------------------
7
13#pragma once
14
15#include <cstdint>
16
17namespace raptor
18{
19
20/*\brief Adjust the default seed such that it does not interfere with the IBF's hashing.
21 *\param kmer_size The used k-mer size. For gapped shapes, this corresponds to the number of set bits (count()).
22 *\details
23 *
24 * The hashing used with the IBF assumes that the input values are uniformly distributed.
25 * However, we use a 64 bit seed, and unless the `kmer_size` is 32, not all 64 bits of the k-mers change.
26 * Hence, we need to shift the seed to the right.
27 *
28 * For example, using 2-mers and a seed of length 8 bit, the values for the k-mers will only change for the last 4 bits:
29 *
30 * ```
31 * seed = 1111'1011
32 * kmer = 0000'XXXX
33 * ```
34 *
35 * `seed XOR kmer` will then always have 4 leading ones.
36 */
37static inline constexpr uint64_t adjust_seed(uint8_t const kmer_size) noexcept
38{
39 return 0x8F3F73B5CF1C9ADEULL >> (64u - 2u * kmer_size);
40}
41
42} // namespace raptor