Quaternion modeling of the helical path for analysis of the shape of the DNA molecule

Пространственная организация, форма молекулы ДНК являются ключевой характеристикой, определяющей ее функциональную специфичность и природу межмолеку­ лярных взаимодействий. Специфическая форма, которую молекула ДНК принимает при определенных условиях, обусловлена ее микромеханическими и структурными особенностями, зависящими от последовательности нук­ леотидов. Следовательно, отдельные характеристики фор­ мы ДНК могут быть прогнозированы. Предложен ряд моде­ лей для описания внутренней кривизны ДНК, включа ющий набор геометрических параметров двойной спирали, при­ меняемых при компьютерной реконструкции простран­ ственных структур. С другой стороны, необходимые пара ­ метры пар оснований можно рассчитать исходя из обще­ доступной информации атомной структуры ДНК. Принимая пары оснований как твердые тела, их относительное рас­ по ложение в пространстве можно оценить по полученным параметрам. Матрицы являются наиболее распространен­ ным способом реализации преобразований твердого тела и широко используются в моделировании формы ДНК. Бо­ лее простая и надежная альтернатива матрицам – кватер­ нионы. Единичные кватернионы представляют только по­ ворот, тогда как двойные кватернионы объединяют в себе и поворот, и смещение. В настоящем руководстве алгебра единичных и двойных кватернионов впервые применена для моделирования траектории молекулы ДНК исходя из конформационных параметров динуклеотидных шагов. Хотя использование двойных кватернионов опти мально для детального моделирования структуры, единич ные кватернионы достаточны для прогнозирования траек тории двойной спирали и последующих расчетов ее простран­ ственных характеристик. Обсуждаются широко использу­ емые, а также оригинальные алгоритмы вычисления кри­ визны, радиуса гирации, персистентной длины и фазиро­ вания статических изгибов для анализа формы молекулы, вычисления статистики полимерной цепи и прогнозиро­ вания микромеханических свойств на основе координат траектории ДНК­спирали. Приведенные алгоритмы будут полезны как в ходе in silico анализа относительно коротких фрагментов ДНК, так и в топологическом картировании полных геномов.

The threedimensional shape of a DNA molecule is a key pro perty influencing its functional specificity and the nature of its molecular interactions.The characteristic shape into which a DNA molecule folds under certain conditions is a manifesta tion of its micromechanical and structural features, which are sequencedependent.DNA shaperelated properties can there fore be determined in a predictable manner.A number of models have been designed to describe intrinsic DNA curvature, incorporating a set of helical parameters which can be applied to operative threedimensional reconstruction of the DNA structures.Alternatively, desired base pair parameters can be computed based on publicly available information about atomic DNA structures.Further, taking the base pairs as rigid bodies, their relative location in space can be estimated based on these parameters.Matrices are a common method to implement any rigid body transformations and are widely used in the modeling of DNA structures.Quaternions are the more straightforward and robust alternative for matrices.Unit quaternions can represent only a rotation, whereas dual qua ternions combine rotation and translation into a single state.In the present guide, the algebra of unit and dual quaternions is applied for the first time to modeling of the DNA helical path, based on conformational parameters of the base pair steps.Although dual quaternions are preferable for modeling of DNA structure in detail, the use of unit quaternions is suffi cient to predict the DNA trajectory and all calculations of DNA shape features.In order to analyze DNA shape and chain sta tistics, and predict the micromechanical properties of DNA molecules based on coordinates of the helical path, the wide ly used as well as original algorithms for computing DNA curvature, radius of gyration, persistence length and phasing of DNA bends are described.Taken together, these algorithms will be useful both in the in silico analysis of relatively short DNA fragments as well as in topological mapping of whole genomes.

Структура и взаимодействие макромолекул
Вавиловский журнал генетики и селекции • 2017 • 21 • 8 T he shape of DNA is sequence-dependent and determined by the intrinsic curvature and flexibility of the molecule, which is driven by external forces.DNA bending and curving play a crucial role in many important biological processes, including recombination, replication and the excision of damaged nucleotides.They are also important for transcriptional regulation of numerous prokaryotic and eukaryotic genes, as well as DNA-protein recognition and interactions.By now it is definitively known that intrinsic DNA curvature is a major determinant of nucleosome organization and positioning, facilitating chromosome folding and DNA packaging in the nucleoid, promoting the appropriate mode of supercoiling and protecting the prokaryotic genome from phage integration (see References in the Supplementary material) 1 .
Predictions of sequence-dependent mechanical properties of DNA are important for understanding many biological processes associated with the various DNA-protein interactions, including the phenomenon of "indirect readout".The micromechanical characteristics of the DNA molecule can be described by a variety of models, including the remarkably effective worm-like chain (WLC) model for semiflexible polymers (Kratky, Porod, 1949).In this model, the bending stiffness of a molecule is described by its persistence length.However, the WLC model is valid only for isotropic, intrinsically straight, homogeneous polymers and cannot be applied in its original form to DNA molecules, which contain non-Gaussian deformations, such as the intrinsic bends (Schellman, Harvey, 1995;Rivetti et al., 1998).
The intrinsic curvature of DNA is mainly determined by the length and localization of adenine tracts (A-tract, A n T m , n + m ≥ 4).The helical phasing of A-tracts and other bent DNA-related sequences for a long time formed the basis of the central concept of intrinsic DNA curvature.More than 30 years ago it was demonstrated how localization of local bends affects the global curvature of a DNA molecule (Wu, Crothers, 1984;Hagerman, 1985;Koo et al., 1986).In particular, it has been shown that macroscopic DNA curvature is strongly affected by the phasing of local bends.Since the bends produced by A-tracts have directional preference, their systematic alternation in phase with the helix screw add coherently and significantly increases the macroscopic DNA curvature, whilst for "straighter" DNA molecules the systematic bends are nearly exactly out of phase (Koo et al., 1986).The Discrete Fourier Transform (DFT) is used to detect the periodicity of any property along a sequence, through calculation of the Fourier transform power spectrum.However, this method is valid only for planar systematic bends, which are oriented in the same direction or, often, the bends, which are determined by repeats of a sequence motif.It is important therefore to predict the macroscopic DNA curvature based on the phase of local bends, regardless of their origin.
Since DNA conformation has been shown to be sequence dependent, this has led to the development of series of models for in silico prediction of DNA shape based on sequencedependent parameters of base pair steps in di-, three-and tetranucleotide contexts.These base pair steps (or often wedge) parameters were chosen through many different methods, such as circularization, gel-mobility, DNAse I digestion and nucleosome positioning data, X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, conformational energy minimization and computer simulation (see References in Supplementary material).Nonetheless, there is no complete consensus on sequence-dependent parameters that are optimal for accurate prediction of DNA shape.
Base pair orientation in the DNA helical structures is described by a set of intra and inters parameters, each including of three rotational and three translational parameters.Internal parameters determine the conformation of bases in the local coordinate frame of a base pair, hence, they do not affect the trajectory of the DNA helical path.The set of inter base pair geometrical parameters consists of the three rotations by tilt (τ), roll (ρ) and twist (Ω), and three translations shift (Dx), slide (Dy) and rise (Dz) about X, Y and Z axes, respectively (Fig. 1), in the standard coordinate frame approved by the Cambridge convention (Dickerson, 1989).In this manner, the inter base pair parameters are used to construct low resolution (coarse-grained) three-dimensional DNA structures.Such translational parameters as shift and slide define the local displacement between adjacent base pairs (√slide 2 + shift 2 ).The magnitude of twist characterizes the degree of torsional twisting of the double helix.Only the parameters of roll and tilt make the main contribution to bending of the DNA molecule, herewith the total local bend of each dinucleotide step can be represented as √tilt 2 + roll 2 .Thus, the DNA trajectory, modeled in this way is a spatial curve, the shape of which is mainly determined by the accumulation of local bends (of specific magnitude and direction), local displacements, translation by rise and rotation by twist.In a recent study, the genome-scale computational analysis of DNA curvature based on threedimensional trajectories of DNA molecules calculated from a set of inter base pair conformational parameters was carried out for Arabidopsis and rice (Masoudi-Nejad et al., 2011).
The rigid transform includes six degrees of freedom, consisting of three translational and three rotational components.The most popular method of storing and combining these transforms are matrices.Use of the matrix transformation technique is a standard method for calculation of the DNA path (De Santis et al., 1990;Bolshoy et al., 1991;Babcock et al., 1994;Liu, Beveridge, 2001).Quaternions are the more straightforward and robust alternative for matrices.They are more compact, reduce the volume of algebra and minimize computation.However, the unit quaternion can represent only a rotation without translation.Dual quaternions allow us to unify the translation and rotation into a single state.
In the present guide to DNA shape analysis the use of unit and dual quaternions to coarse-grained modeling of threedimensional DNA structures is described in detail.Proven algorithms for efficient prediction and analysis of the shape of DNA molecules are described as follows.The discussed methods are fast and robust making them appropriate for operative prediction of curvature-and bendability-dependent DNA shape at any scale.

Obtaining base pair step parameters
The calculation of the DNA helix trajectory is performed based on the average values of local helical parameters.These parameters can be obtained from various sources, such as scientific papers, the base pair parameters database (DiProDB, http:// diprodb.leibniz-fli.de)or alternatively, they can be deduced from three-dimensional DNA structures.This last method is optimal, since it allows us to choose the most suitable or desirable to resolve specific problems the DNA structures (with specific nucleotide content or conformation, obtained in different conditions etc.) to deduce base pairs parameters.This significantly increases the flexibility of DNA molecular modeling and expands the possibilities for further investigations.In particular, specific sets of base pair parameters can be used for modeling and analysis of unusual DNA structures that are different from canonical B-DNA.
As the Protein Data Bank (PDB) is the largest current source of information about the three-dimensional structures of biological molecules (Berman et al., 2003), including nucleic acids, its will be enough to obtain primary data for further estimation of DNA base pair geometry.Currently, more than 1 700 DNA structures are publicly available in the PDB database.They are represented in different conformations obtained using a range of methods, such as X-ray crystallography (of various resolutions), solution NMR spectroscopy, neutron diffraction and others.However, many DNA molecules are chemically modified or represented in complex with ligands.It seems quite clear that DNA conformation is substantially affected by crystal packing, bases modification and interactions of the DNA with the other molecules or ions.Thus, the most suitable base pair conformational parameters for modeling of the DNA structures are provided by NMR in water only solutions.Previously it was shown, that distinct from the X-ray crystallographic analysis, helical parameters derived from NMR structures can correctly predict the curvature of DNA molecules (Gabrielian, Pongor, 1996).However, it should be noted that until the late 1990s the accuracy of the NMR method was not sufficient to evaluate small intrinsic bends of the DNA axis (Vermeulen et al., 2000).A full set of helical parameters from the selected DNA structures can be deduced using the Curves+ (Lavery et al., 2009) or 3DNA (Lu, Olson, 2003) programs.
As a rule, each DNA structure from PDB is represented by 5-10 conformers possessing the lowest energy.For this reason, base pair parameters deduced from individual structures should be averaged over all conformers, and the standard deviation for each parameter should be evaluated (further, the standard deviations can be used for estimation of the dynamic persistence length).To eliminate errors associated with "end effects", only the central base pairs should be used in computation.Since structural properties of dinucleotides in the DNA molecule depend upon the flanking base pairs, the parameters of base pair steps should wherever possible be applied in tetra-nucleotide context.(The base pair step parameters used here for reconstruction of the DNA trajectories are provided in the Supplementary material.)

DNA path modeling
According to quaternion algebra, the vector function of the DNA sequence describing the position of the n-th base pair in the DNA path in the Cartesian coordinate system can be represented as: , (1) is the consistent quaternion multiplication of the quaternion Q Ω k ρ k τ k realizing rotation about the Z, Y and X axes consistently by angles, defined by the twist (Ω), roll (ρ) and tilt (τ) parameters of the neighboring base pairs (k -1, k): ) contains the coordinates of the i-th element (atom, molecule, point or any structure) in the intrinsic coordinate frame of the dinucleotide step (for example, the coordinates of the idealized phosphate of the first strand are contained in the vector → V (Dx i ,8.91 + Dy i ,2.08 + Dz i ) ).The quaternion of rotation and translational vector together calculate the rigidbody transformation between two successive base pair steps.
The initial displacement on shift (Dx), slide (Dy) and rise (Dz) for each transformation can be simplistically represented using dual quaternions: The dual quaternion Qe i contains the coordinate of any element, which is associated with the i-th base pair (for example, the coordinates of the base pair center are contained in the dual quaternion Qe i = [1,0,0,0][0,0,0,0]; coordinates of the idealized phosphate of the first strand in Qe i = [1,0,0,0][0,0,8.91,2.08]).The use of dual quaternions is a fast, simple and robust way of molecular modeling.For example, to model base pair conformation, determined by such local (inter base pair) parameters as shear (Sx), stretch (Sy), stagger (Sz), buckle (κ), Кватернионное моделирование траектории спирали для анализа формы молекулы ДНК propeller (π) and opening (σ) the dual quaternion Qe i for each base in the i-th pair can be represented as: Similar to dinucleotide step parameters, the local geometric parameters of base pairs can be obtained from articles or deduced from published three-dimensional DNA structures using Curves+ or 3DNA programs.The atomic coordinates for each base and sugar phosphate backbone can be obtained from relevant publications (Clowney et al., 1996;Gelbin et al., 1996;Parkinson et al., 1996;Olson et al., 2001) or structure databases.
Although displacements by slide and shift do not contribute to DNA bending, they make the main contribution to noise in further measurements.For this reason, these parameters can be ignored in subsequent in silico estimations of the DNA curvature.If for subsequent measurements only the DNA trajectory, excluding displacement by shift and slide is required, the use of dual quaternions is redundant and the equations above for unit quaternions can be simplified to: Calculation of the three-dimensional DNA structures was carried out in the local Cartesian coordinate system in accordance with the transformations defined by the Cambridge convention on the definitions and nomenclature of nucleic acid structure components (Dickerson, 1989).The center of the first base pair is located at the origin of the coordinate frame, at point 0.

Curvature
Overall, the curvature vector of a spatial line is defined as the derivative of the tangent unit vector along this line.Its modulus is the inverse of the curvature radius and its direction is the direction of the main normal to the curve (Landau, Lifshitz, 1970).In terms of a DNA molecule this means the angular deviation between the local helical axes of the successive base pairs.The curvature forms the basis for further estimations of the shape of the DNA molecule.For this reason, there are many ways in which it can be measured.As a rule, the curvature is estimated in DNA curvature units, where one curvature unit is defined as the average curvature of DNA in the nucleosome core particle, 1/42.8Å (Trifonov, Ulanovsky, 1987).
The DNA curvature of a segment can be calculated from the inverse of the radius of a circumscribed circle of a triangle with vertices on helix axis coordinates at the center and both ends of this segment.If the sliding window has a length of 2hw bp, the DNA curvature C in the i-th position is given by: where a, b and c are the distances between the points (i -hw, i), (i, i+hw) and (i -hw, i+hw), respectively.Curvature estimation during the least square circle (LSC) fit is based on the fitting of a circle to the coordinates of the DNA molecule curved in a plane.The radius of this circle is taken as a measure of the curvature of the DNA fragment (Kanhere, Bansal, 2003).In order to evaluate the planarity of the bend and further projection, the best fit (least squares) plane (LSP) is calculated to a set of base pair centers of the analyzed segment.The possibility of applying this method should be grounded by low values of the root-mean-square deviation (RSMD), estimated for LSC and LSP (RMSD values of distances from the reference and fitted feature (circle or plane)).
The original interpretation of DNA curvature was proposed by De Santis (De Santis et al., 1988).Since DNA curvature C(n) is a local property of the DNA axis and represents its directional change along the sequence, the distortions of the B-DNA axis along the chain gives the value of curvature per turn in modulus and phase, calculated for recurrent turns along the sequence (De Santis et al., 1988): where C(n) is the average curvature vector, characterizing the orientation deviation of the helical axis between n 1 , and n 2 sequence numbers, per turn of DNA with helical periodicity v (v = 10.4), assigned to the position of the average sequence number of the tract (n = (n 2 + n 1 )/2), the local deviation of the s-th base pair plane from the canonical B-DNA represented as a complex vector in terms of the roll and tilt angles.This is the most popular method for estimation of DNA curvature from the nucleotide sequence directly, without modeling of the helical path in advance.
The local bend angle characterizes the deflection of the helix axis and is calculated as the angle between the tangent vectors in the direction along the contour of the DNA molecule (or as arccosine of a scalar product if these tangents are unit vectors).The tangent vectors can be represented as the unit vectors, indicating orientation of the base pair centers, or as the vectors, connecting n and n + s bp (s∈ [3,15]), or alternatively, as the unit vectors (normals) averaged over 11-15 bp (centroids of helical turns).Some authors distinguish the bend angles estimated using of tangent vectors obtained in the latter two ways as successive and cumulative (Kanhere, Bansal, 2003).The cumulative bending angle between the averaged normal vectors >15 bp apart is often used in the estimation of DNA curvature (Goodsell, Dickerson, 1994;Gabrielian, Pongor, 1996).
DNA curving can be estimated based on radius of gyration (see Supplementary materials for detail).When comparing DNA molecules of identical sizes the radius of gyration will be smaller for curved and larger for extended molecules (Dlakic, Harrington, 1998).Furthermore, information about the different shape properties such as symmetry, anisotropy, Breeding • 2017 • 21 • 8 asphericity, acylindricity and more can be extracted from the various combinations of the orthogonal components of the radius of gyration tensor (Jernigan et al., 1987;Olson et al., 1993;Dlakic, Harrington, 1998;Kanhere, Bansal, 2003;Rawat, Biswas, 2009).

Structure and interaction of macromolecules Vavilov Journal of Genetics and
The d-max (Tung, Burks, 1987) is the maximum orthogonal distance from the base pair center to the straight line connecting the ends of the analyzed segment, evaluated over all base pair centers of this segment.This value is estimated as the maximal perpendicular distance (p-dist) over all base pair centers.In its turn, the perpendicular distance is the minimal distance from the i-th base pair to the straight line connecting the ends of the segment.
Lastly, the curvature of the DNA molecule can be simply characterized by the ratio of the curvilinear distance (contour length) to the linear distance between the ends of the segment (SD value) (Eckdahl, Anderson, 1987;Tan, Harvey, 1987) and vice versa (Qc, (Dlakic, Harrington, 1998;Kanhere, Bansal, 2003;Matyašek et al., 2013)).Large values of C, d-max and SD indicate a more curved DNA fragment.
The sliding window technique is a common way for analyzing this distribution.Reducing the size of the sliding window increases the resolution and magnitude of the curvature measurements, but also increases the noise that is determined by local bends (by roll and tilt) and displacements (by shift and slide) of adjacent base pairs.In order to reduce the noise, the size of the sliding window as a multiple of a helical turn (10 or 10.4 bp), from 20 bp is recommended.Furthermore, since local bends during a helical turn, as a rule, are mutually compensated, when the curvature is estimated for a position in the center of sliding window, the size of a half window should be a multiple of 10 (for example, see the analysis of the "Hagerman paradox", below).

Persistence length
Persistence length is a key parameter for quantitative interpretation of the conformational properties of DNA.Theoretically, the persistence length can be formulated in terms of the magnitude of the projection vector and the tangent-tangent correlation function.In previous studies, both methods were successfully applied to estimation of the persistence length of DNA molecules from the reconstructed 3D trajectories of helical paths (Shpigelman et al., 1993;Bednar et al., 1995;Schellman, Harvey, 1995;Vologodskaia, Vologodskii, 2002).
In terms of projection vector, the persistence length (P) can be defined as the average projection of the end-to-end vector (I N ) onto the unit vector of the first segment (I 1 ) in the limit of infinite chain length (Flory, 1969).It is likely that for a long heterogeneous chain with average link angle the more statistically significant is the average projection of end-to-end vector on the direction (unit vector) of each segment along the chain.Thus the persistence length is averaged over position and direction is given by:

N→∞
Persistence length, measured in this way characterizes the chain length through which the memory of the initial orientation persists.So that its will be strongly dependent on the direction, length, homogeneity and shape of molecule.
Nevertheless, for the cases when L>>P this method can be effectively used for the operative estimation of the DNA persistence length.
According to the tangent-tangent correlation function (Landau, Lifshitz, 1958): where cos(θ) is the average cosine calculated from the complete set of local bend angles (θ) measured between the unit vectors (I ) tangent to the chain at the points s i and s i +1 (spaced of contour length l (s i +1 -s i )) collected over the entire chain.Tangent vectors can be calculated in the various ways described above keeping in mind that the method employed will dramatically affect the deflection angle and, hence, persistence length estimation.In particular the persistence length strongly depends on the scale of measurements, i. e. when unit segments of the DNA chain are bonds between base pairs, or when they are centroids of helical turns (see Supplementary materials).
The average cosine θ is a multiplicative function of segment length l, assuming that the average directional correlation between two segments decays exponentially along the chain.The distribution of -ln cos(θ) is evaluated for the segments of different length l, where l is the average curvilinear distance between base pair centers, separated by the n bp window, moving along the sequence with a step of ≥ 1 bp.Since correlation in the orientation of segments decays exponentially to their distance, the -ln cos(θ) will be linear.The persistent length can be calculated from the inverse of the regression slope of the -ln cos(θ) distribution plot.Application of this method should be justified by a good linear fit of the -ln cos(θ) distribution (coefficient of determination, R 2 ≥ 0.9).
The equations above are suitable only for intrinsically straight, homogeneous polymers.However, in most cases the analyzed DNA is not straight, but contains numerous planar and coplanar static bends as well as segments with different flexibilities, located at any position along the chain.For such heterogeneous polymers, the persistence length strongly depends on the starting position, the direction, and the length of the fragment.One of possible solutions is the division of the DNA path into a set of large fragments between the bends and a set of small fragments overlapping these bends, then estimation of P can be conducted for each of these fragments separately (Supplementary material contains examples for different cases).Following that, the average persistence length of a chain can be derived from summation of P over all fragments (N f ) with length l (Rivetti et al., 1998): Thus, the resultant persistence length (averaged over all fragments of molecule) will be mainly determined by the minimal P among all fragments.Furthermore, it seems that the contribution of intrinsic bends to persistence length calculated in this way will be decreased.
The apparent persistence length (P a ) of the DNA molecule includes contributions from both static (P s ) and dynamic (P d ) persistence lengths, which are related as follows (Trifonov et al., 1988;Schellman, Harvey, 1995): Структура и взаимодействие макромолекул Вавиловский журнал генетики и селекции • 2017 • 21 • 8 Кватернионное моделирование траектории спирали для анализа формы молекулы ДНК Static persistence length is determined by the intrinsic DNA curvature (shape) at the minimum energy conformation, without accounting for thermal fluctuations.The equations above, applied to the predicted DNA trajectories, estimate the static persistence length.Conversely, the dynamic persistence length characterizes the rigidity of the DNA molecule in thermal fluctuations of the angles between adjacent base pairs.Assuming crystal packing, interactions with proteins and thermal fluctuations as external perturbing force fields, eliminating correlations in the bend directions, the dynamic persistence length can be estimated from the DNA sequence using a set of dispersions of roll (ρ) and tilt (τ) angles (which can be obtained from the same sources as for the base pair parameters or during averaging of base pair parameters, see Section 1) for all dinucleotide steps (Vologodskaia, Vologodskii, 2002): where p i is the probability of dinucleotide i, the length of segment l << P is an average rise (~3.38 Å) and 2 θ i is the variance of the local bend angle (in radians) of i-th dinucleotide ( 2 θ i = Δ 2 ρ i + Δ 2 τ i , ρ and τ in radians, θ i << π/2).However it should be noted that whereas flexibility of base pair steps in overall correlates well with local bends, as determined by roll and tilt (Olson et al., 1998;Packer et al., 2000;McConnell, Beveridge, 2001), the bends produced by A-tracts are highly stable with a midpoint of structural transition near 30 °C (Chan et al., 1993) and a melting temperature above 37 °C (Chan et al., 1990;Jerkovic, Bolton, 2000).Thus, although the static persistence length of A-tracts-produced bends is very small, the dynamic persistence length of these bends will be very large.Since P a is always less than either P s or P d the correlation between apparent persistence length, calculated from the summation of -1 P s and -1 P d , and the stiffness of DNA containing A-tract induced bends will be lost.The extremely low static (P s ) and exceptionally high dynamic (P d ) persistence lengths at relatively large apparent persistence length (P a ) were noted in a recent study of DNA molecules containing A-tract related large intrinsic bends (Mitchell et al., 2017).

Helical phasing. DFT analysis
Discrete Fourier transform is extremely useful for macroscopic DNA curvature analysis.This is due to its ability to reveal periodicity in analyzed DNA properties, such as local curvature in sliding window, as well as the relative strength of any periodic components.Subsequent DFT frequency spectrum analysis allows us to detect and quality evaluate the regular alternation of DNA bends along the molecule.The discrete Fourier transform is defined as: where x j is a property value in the j-th position, represented as a complex number with a zero-valued imaginary part, and k is the frequency domain (for positive frequencies k ∈ [0, N/2 -1]).
In the case of DNA analysis, the sampling frequency ( fs, sampling rate) is the ratio of the data set size (N ) to the step of the sliding window.The minimal frequency (frequency resolution) is given by Δf = fs/N and maximal frequency (Nyquist frequency) is fs/2.Since this DFT spectrum is N-periodic, each frequency domain characterizes the distribution of values of the analyzed features with frequencies of f per N bp and a maximum 1/2 bp -1 ( f = Δfx = x, x ∈ [0, N/2]).
The magnitude of the frequency domain for the real input data is calculated as: x 2j +1 e 2πijk/(N/2) 2 .( 14) The phasing of systematic bends can be evaluated from the data of distribution of the local curvature in the course of frequency analysis of the DFT power spectrum.The frequency of helix turns ( f h) in the DFT spectrum is N/10 (where 10 is the floor of average twist, 10.4 bp) and the initial phase is 0°.The local bends repeated with a frequency multiple of f h or with variations around this are phased with the helix screw: they have the same direction and progressively increase the macroscopic curvature of the DNA molecule.The fractional part of the ratio of f h to the frequency of local bends repeated in antiphase is near to 0.5: these bends are oriented oppositely (phase shift is 180°, 2π • 0.5) and mutually compensate each other.
In order to demonstrate how phasing of DNA bends affects the shape of molecules, the trajectories of the (A 5 N 5 ) n (curved DNA, local bends in phase) and (A 5 N 10 ) n (straight DNA, local bends out of phase) sequences were modeled and the DFT power spectrum of curvature distribution was analyzed (Fig. 2).If sampling is 128, the f h = 13 (128/10).The (A 5 N 10 ) n DNA fragment contains systematic bends, repeated with a frequency between 8 and 9 that corresponds to { f h/8.5} ≈ 0.5.Hence, the phases of neighboring bends will be opposite and the molecule is almost straight (see Fig. 2).The systematic bends of the (A 5 N 5 ) n DNA fragment repeated with a frequency of 13 corresponds to f h, hence, the local bends alternate in phase with the helix periodicity, and the resulting global curvature of DNA molecule will be large (see Fig. 2).
The magnitude of this frequency bin in terms of DHT can be expressed as: The DFT power spectrum should be calculated for frequencies in range of minimal and maximal values of twist angle (as a rule from 27° to 42° for B-DNA).The supplementary material contains some recommendations on the use of DFT analysis in studying of the DNA curvature distribution.

Phasing of coplanar and non-periodic bends
It seems obvious that non-periodically bends in phase as well as bends in opposite directions in antiphase will increase the macroscopic DNA curvature.Furthermore, the orientation of bends is not distinguished by the DFT method, so that coplanar bends will be taken as planar, leading to distortion of the results.For this reason, the definition of phase relative to the direction of the DNA trajectory will be useful.In order to define the relative phase (φ vb ) of the vector position of the center of bend ("bend vector", for example this can be the center of an analyzed segment: vb n + s = v n + s -v n ) the "directional vector" vd n = v n + 2s -v n was aligned with the X axis and then the coordinate frame was rotated around the Y axis by 90° to align vd with the Z axis (the direction of this rotation is determined by the sign of x coordinate of vd).The quaternion for realizing these rotations is given by Q φ : where 1+ qR 0 2 equivalent to the cos(acos(qR 0 )/2), 1-qR 0 2 is the sin(acos(qR 0 )/2), and {qR 1 , qR 2 , qR 3 }/qR are the direction cosines at X, Y Z axes respectively.
To align the unit vector with the X axis we can simplify the calculation of qR to: where the x vd |x vd | ratio is necessary to get a sign.The phase is estimated by all four quadrants as the cosine of the angle between the projection of curvature vector on XY plane and the direction of the X axis (1, 0, 0): For example, Hagerman has shown that DNA sequences containing repeating runs of A 4 T 4 in phase were significantly bent, whereas those with T 4 A 4 (A-tract in opposite polarity) were almost straight (Hagerman, 1986).Thus, there is a fundamental difference in the structure of A n T n and T n A n DNA segments, despite their identical nucleotide composition and phased bends.In fact, the distribution of curvature in these DNA molecules have the same DFT frequency spectrum with a peak of magnitude for 10.04 bp period, which is multiplied to the period of helix axis ({10.4/10.04}= 0.036 << 5) that is also consistent with experimental data (Price, Tullius, 1993).Thus, at first glance in both cases the phased bends should increase the macroscopic curvature of DNA molecules significantly.However analysis of the phase spectrum makes it clear (Fig. 3) why in the first case ((CA 4 T 4 G) n ) DNA fragments In particular its shows that the bends produced by (CT 4 A 4 G) n are oriented in opposite directions and mutually compensate each other (see Fig. 3).However, it should be noted that magnitudes of DFT power spectrum of the (CT 4 A 4 G) n straight fragment will be relative high only if the size of the sliding window wherein DNA curvature is estimated is in multiples of an odd number of turns.If the size of the half window is a multiple of 10 (one helical turn) the amplitude of curvature of the (CT 4 A 4 G) n fragment is near to 0. This is consistent with the fact that oppositely oriented bends, in this motif, are mutually compensated in one turn (Stefl et al., 2004).The present example shows how a change in the size of the sliding window can be adopted for specific research tasks.

Conclusion
More than 30 years ago it was shown that computer modeling of DNA sequences is a viable approach to the study of the biological implications of DNA structure.A lot of research has been devoted to this problem.The estimation of the intrinsic curvature of relatively short DNA fragments (< 10 kb) can be useful in investigating various features, such as analysis of promoter and regulatory regions of specific genes, or the parts of the genome associated with recombination suppression and heterochromatin packing, nucleosome positioning, design of shape-related DNA markers and many others.Furthermore, the actual aim is the monitoring of DNA curvature and bendability over hundreds and thousands base pairs.Similar studies, previously performed for many prokaryotic and eukaryotic genomes, provided important information about their spatial organization and its influence on various biological processes.The algorithms discussed in the present guide formed the basis for such investigations and will be useful in analysis of relatively short DNA fragments as well as for topological mapping of whole genomes.

Fig. 1 .
Fig. 1.Base pair orientation parameters, following the Cambridge con vention.Three rotations (twist, roll, tilt) and three translations (rize, slide, shift) of a base pair about the Z, Y and X axes are indicated.

Fig. 2 .
Fig. 2.Reconstructed DNA paths of the experimental molecules, containing repeated Atract in phase ((A 5 N 5 ) n ) and out of phase ((A 5 N 10 ) n ) with the helix screw.DFT power spectral for both molecules were calculated using values of DNA curvature as input data, estimated in a sliding window of 40 bp and step of 1 bp, using equation (6).The dominant frequencies for each spectrum are indicated.