Commit Graph

304 Commits

Author SHA1 Message Date
Gerrit Renker
2bbf29acd8 [DCCP] tfrc: Binary search for reverse TFRC lookup
This replaces the linear search algorithm for reverse lookup with
binary search.

It has the advantage of better scalability: O(log2(N)) instead of O(N).
This means that the average number of iterations is reduced from 250
(linear search if each value appears equally likely) down to at most 9.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:53:27 -02:00
Gerrit Renker
44158306d7 [DCCP] ccid3: Deprecate TFRC_SMALLEST_P
This patch deprecates the existing use of an arbitrary value TFRC_SMALLEST_P
 for low-threshold values of p. This avoids masking low-resolution errors.
 Instead, the code now checks against real boundaries (implemented by preceding
 patch) and provides warnings whenever a real value falls below the threshold.

 If such messages are observed, it is a better solution to take this as an
 indication that the lookup table needs to be re-engineered.

Changelog:
----------
 This patch
   * makes handling all TFRC resolution errors local to the TFRC library

   * removes unnecessary test whether X_calc is 'infinity' due to p==0 -- this
     condition is already caught by tfrc_calc_x()

   * removes setting ccid3hctx_p = TFRC_SMALLEST_P in ccid3_hc_tx_packet_recv
     since this is now done by the TFRC library

   * updates BUG_ON test in ccid3_hc_tx_no_feedback_timer to take into account
     that p now is either 0 (and then X_calc is irrelevant), or it is > 0; since
     the handling of TFRC_SMALLEST_P is now taken care of in the tfrc library

Justification:
--------------
 The TFRC code uses a lookup table which has a bounded resolution.
 The lowest possible value of the loss event rate `p' which can be
 resolved is currently 0.0001.  Substituting this lower threshold for
 p when p is less than 0.0001 results in a huge, exponentially-growing
 error.  The error can be computed by the following formula:

    (f(0.0001) - f(p))/f(p) * 100      for p < 0.0001

 Currently the solution is to use an (arbitrary) value
     TFRC_SMALLEST_P  =   40 * 1E-6   =   0.00004
 and to consider all values below this value as `virtually zero'.  Due to
 the exponentially growing resolution error, this is not a good idea, since
 it hides the fact that the table can not resolve practically occurring cases.
 Already at p == TFRC_SMALLEST_P, the error is as high as 58.19%!

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:53:07 -02:00
Gerrit Renker
006042d7e1 [DCCP] tfrc: Identify TFRC table limits and simplify code
This
 * adds documentation about the lowest resolution that is possible within
   the bounds of the current lookup table
 * defines a constant TFRC_SMALLEST_P which defines this resolution
 * issues a warning if a given value of p is below resolution
 * combines two previously adjacent if-blocks of nearly identical
   structure into one

This patch does not change the algorithm as such.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:52:41 -02:00
Gerrit Renker
8d0086adac [DCCP] tfrc: Add protection against invalid parameters to TFRC routines
1) For the forward X_calc lookup, it
    * protects effectively against RTT=0 (this case is possible), by
      returning the maximal lookup value instead of just setting it to 1
    * reformulates the array-bounds exceeded condition: this only happens
      if p is greater than 1E6 (due to the scaling)
    * the case of negative indices can now with certainty be excluded,
      since documentation shows that the formulas are within bounds
    * additional protection against p = 0 (would give divide-by-zero)

 2) For the reverse lookup, it warns against
    * protects against exceeding array bounds
    * now returns 0 if f(p) = 0, due to function definition
    * warns about minimal resolution error and returns the smallest table
      value instead of p=0 [this would mask congestion conditions]

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:52:26 -02:00
Gerrit Renker
90fb0e60dd [DCCP] tfrc: Fix small error in reverse lookup of p for given f(p)
This fixes the following small error in tfrc_calc_x_reverse_lookup.

 1) The table is generated by the following equations:
	lookup[index][0] = g((index+1) * 1000000/TFRC_CALC_X_ARRSIZE);
	lookup[index][1] = g((index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE);
    where g(q) is 1E6 * f(q/1E6)

 2) The reverse lookup assigns an entry in lookup[index][small]

 3) This index needs to match the above, i.e.
    * if small=0 then

      		p  = (index + 1) * 1000000/TFRC_CALC_X_ARRSIZE

    * if small=1 then

		p = (index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE

These are exactly the changes that the patch makes; previously the code did
not conform to the way the lookup table was generated (this difference resulted
in a mean error of about 1.12%).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:52:01 -02:00
Gerrit Renker
50ab46c790 [DCCP] tfrc: Document boundaries and limits of the TFRC lookup table
This adds documentation for the TCP Reno throughput equation which is at
the heart of the TFRC sending rate / loss rate calculations.

It spells out precisely how the values were determined and what they mean.
The equations were derived through reverse engineering and found to be
fully accurate (verified using test programs).

This patch does not change any code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:51:29 -02:00
Gerrit Renker
26af3072b0 [DCCP] ccid3: Fix warning message about illegal ACK
This avoids a (harmless) warning message being printed at the DCCP server
(the receiver of a DCCP half connection).

Incoming packets are both directed to

 * ccid_hc_rx_packet_recv() for the server half
 * ccid_hc_tx_packet_recv() for the client half

The message gets printed since on a server the client half is currently not
sending data packets.
This is resolved for the moment by checking the DCCP-role first. In future
times (bidirectional DCCP connections), this test may have to be more
sophisticated.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:51:14 -02:00
Gerrit Renker
5c3fbb6acf [DCCP] ccid3: Fix bug in calculation of send rate
The main object of this patch is the following bug:
 ==> In ccid3_hc_tx_packet_recv, the parameters p and X_recv were updated
     _after_ the send rate was calculated. This is clearly an error and is
     resolved by re-ordering statements.

In addition,
  * r_sample is converted from u32 to long to check whether the time difference
    was negative (it would otherwise be converted to a large u32 value)
  * protection against RTT=0 (this is possible) is provided in a further patch
  * t_elapsed is also converted to long, to match the type of r_sample
  * adds a a more debugging information regarding current send rates
  * various trivial comment/documentation updates

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:50:56 -02:00
Gerrit Renker
76d127779e [DCCP]: Fix BUG in retransmission delay calculation
This bug resulted in ccid3_hc_tx_send_packet returning negative
delay values, which in turn triggered silently dequeueing packets in
dccp_write_xmit. As a result, only a few out of the submitted packets made
it at all onto the network.  Occasionally, when dccp_wait_for_ccid was
involved, this also triggered a bug warning since ccid3_hc_tx_send_packet
returned a negative value (which in reality was a negative delay value).

The cause for this bug lies in the comparison

 if (delay >= hctx->ccid3hctx_delta)
	return delay / 1000L;

The type of `delay' is `long', that of ccid3hctx_delta is `u32'. When comparing
negative long values against u32 values, the test returned `true' whenever delay
was smaller than 0 (meaning the packet was overdue to send).

The fix is by casting, subtracting, and then testing the difference with
regard to 0.

This has been tested and shown to work.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:50:42 -02:00
Gerrit Renker
8a508ac26e [DCCP]: Use higher RTO default for CCID3
The TFRC nofeedback timer normally expires after the maximum of 4
RTTs and twice the current send interval (RFC 3448, 4.3). On LANs
with a small RTT this can mean a high processing load and reduced
performance, since then the nofeedback timer is triggered very
frequently.

This patch provides a configuration option to set the bound for the
nofeedback timer, using as default 100 milliseconds.

By setting the configuration option to 0, strict RFC 3448 behaviour
can be enforced for the nofeedback timer.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:50:23 -02:00
Gerrit Renker
6b57c93dc3 [DCCP]: Use `unsigned' for packet lengths
This patch implements a suggestion by Ian McDonald and

 1) Avoids tests against negative packet lengths by using unsigned int
    for packet payload lengths in the CCID send_packet()/packet_sent() routines

 2) As a consequence, it removes an now unnecessary test with regard to `len > 0'
    in ccid3_hc_tx_packet_sent: that condition is always true, since
      * negative packet lengths are avoided
      * ccid3_hc_tx_send_packet flags an error whenever the payload length is 0.
        As a consequence, ccid3_hc_tx_packet_sent is never called as all errors
        returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit

 3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter),
    since it is currently always set to skb->len. The code is updated with regard
    to this parameter change.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:02 -08:00
Gerrit Renker
a79ef76f4d [DCCP] ccid3: Larger initial windows
This implements the larger-initial-windows feature for CCID 3, as described in
section 5 of RFC 4342. When the first feedback packet arrives, the sender can
send up to 2..4 packets per RTT, instead of just one.

The patch further
 * reduces the number of timestamping calls by passing the timestamp value
   (which is computed in one of the calling functions anyway) as argument

 * renames one constant with a very long name into one which is shorter and
   resembles the one in RFC 3448 (t_mbi)

 * simplifies some of the min_t/max_t cases where both `x', `y' have the same
   type

Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:01 -08:00
Gerrit Renker
5aed324369 [DCCP]: Tidy up unused structures
This removes and cleans up unused variables and structures which have become
unnecessary following the introduction of the EWMA patch to automatically track
the CCID 3 receiver/sender packet sizes `s'.

It deprecates the PACKET_SIZE socket option by returning an error code and
printing a deprecation warning if an application tries to read or write this
socket option.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:59 -08:00
Gerrit Renker
78ad713da6 [DCCP] ccid3: Track RX/TX packet size `s' using moving-average
Problem:
2006-12-02 21:30:58 -08:00
Gerrit Renker
2a1fda6f6c [DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448
This corrects the setting of the nofeedback timer with regard to RFC
3448 - previously it was not set to max(4*R, 2*s/X) as specified. Using
the maximum of 1 second as upper bound (as it was done before) can have
detrimental effects, especially if R is small.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:57 -08:00
Gerrit Renker
5d0dbc4a9b [DCCP] ccid3: Consolidate handling of t_RTO
This patch
 * removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is
   undefined until feedback has been received);

 * makes some trivial changes (updates of comments);

 * performs a small optimisation by exploiting that the feedback timeout
   uses the value of t_ipi. The way it is done is safe, because the timeouts
   appear after the changes to t_ipi, ensuring that up-to-date values are used;

 * in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation
   of the next_tmout. This makes the code clearer to read and is also safe, since
   t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not
   read by the functions called via ccid_wait_for_ccid();

 * removes a `max' statement in sk_reset_timer, this is not needed since the timeout
   value is always greater than 1E6 microseconds.

 * adds `XXX'es to highlight that currently the nofeedback timer is set
   in a non-standard way

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:52 -08:00
Gerrit Renker
17893bc1a6 [DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta
This patch:

 * consolidates updating of parameters (t_nom, t_ipi, t_delta) which
   need to be updated at the same time, since they are inter-dependent

 * removes two inline functions which are no longer needed as a result of
   the above consolidation

 * resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback
   timer, in the state where no feedback has previously been received

 * ties updating these parameters to updating the sending rate X, exploiting
   that all three parameters in turn depend on X; and using a small optimisation
   which can reduce the number of required instructions: only update the three
   parameters when X really changes

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:51 -08:00
Gerrit Renker
48e03eee71 [DCCP] ccid3: Consolidate timer resets
This patch concerns updating the value of the nofeedback timer when no feedback
has been received so far.

Since in this case the value of R is still undefined according to [RFC 3448,
4.2], we can not perform step (3) of [RFC 3448, 4.3].  A clarification is
provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback
timer (still) expires "after two seconds".

Many thanks to Ian McDonald for pointing this out and providing the
clarification.

The patch
  * implements [RFC 4342, sec. 5] with regard to the above case
  * consolidates handling timer restart by
	- adding an appropriate jump label and
	- initialising the timeout value

Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:50 -08:00
Gerrit Renker
5e19e3fcd7 [DCCP] ccid3: Resolve small FIXME
This considers the  case - ACK received while no packet has been sent
so far. Resolved by printing a (rate-limited) warning message.

Further removes an unnecessary BUG_ON in ccid3_hc_tx_packet_recv,
received feedback on a terminating connection is simply ignored.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:41 -08:00
Gerrit Renker
70dbd5b0ef [DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent
This patch removes a switch statement which is redundant since,
 * nothing is done in states TFRC_SSTATE_NO_SENT/TFRC_SSTATE_NO_FBACK
 * it is impossible that the function is called in the state TFRC_SSTATE_TERM, since
       --the function is called, in dccp_write_xmit, after ccid3_hc_tx_send_packet
       --if ccid3_hc_tx_send_packet is called in state TFRC_SSTATE_TERM, it returns
         -EINVAL, which means that ccid3_hc_tx_packet_sent will not be called
	 (compare dccp_write_xmit)
       --> therefore, this case is logically impossible
 * the remaining state is TFRC_SSTATE_FBACK which conditionally updates t_ipi, t_nom,
   and t_delta. This is a no-op, since
       --t_ipi only changes when feedback is received
       --however, when feedback arrives via ccid3_hc_tx_packet_recv, there is an identical
         code block which performs the same set of operations
       --performing the same set of operations again in ccid3_hc_tx_packet_sent therefore
         does not change anything, since between the time of receiving the last feedback
	 (and therefore update of t_ipi, t_nom, and t_delta), the value of t_ipi has not
	 changed
       --since t_ipi has not changed, the values of t_delta and t_nom also do not change,
         they depend fully on t_ipi

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:40 -08:00
Gerrit Renker
da335baf9e [DCCP] ccid3: Avoid congestion control on zero-sized data packets
This resolves an `XXX' in ccid3_hc_tx_send_packet().

The function is only called on Data and DataAck packets and returns a negative
result on zero-sized messages. This is a reasonable policy since CCID 3 is a
congestion-control module and congestion control on zero-sized Data(Ack)
packets is in a way pathological.

The patch uses a more suitable error code for this case, it returns the Posix.1
code `EBADMSG' ("Not a data message") instead of `ENOTCONN'.

As a result of ignoring zero-sized packets, a the condition for a warning
"First packet is data" in ccid3_hc_tx_packet_sent is always satisfied; this
message has been removed since it will always be printed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:39 -08:00
Gerrit Renker
7da7f456d7 [DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet
This makes some logically equivalent simplifications, by replacing
rc - values plus goto's with direct return statements.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:38 -08:00
Gerrit Renker
91cf5a1725 [DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission
Problem:
2006-12-02 21:30:37 -08:00
Gerrit Renker
f5c2d6367b [DCCP] ccid3: Simplify control flow in the calculation of t_ipi
This patch performs a simplifying (performance) optimisation:

 In each call of the inline function ccid3_calc_new_t_ipi(), the state is
 tested against TFRC_SSTATE_NO_FBACK. This is expensive when the function
 is called very often. A simpler solution, implemented by this patch, is
 to adapt the control flow.

Background:
2006-12-02 21:30:36 -08:00
Gerrit Renker
90feeb951f [DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi
Problem:
2006-12-02 21:30:35 -08:00
Andrea Bittau
6472c051fc [DCCP] ccid2: Allow window to grow larger
Now that we can stuff bigger ack vectors into options.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:34 -08:00
Ian McDonald
455431739c [DCCP] CCID3: Remove non-referenced variable
This removes a non-referenced variable.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:41 -08:00
Gerrit Renker
23ea8945f6 [CCID 3]: Add annotations for socket structures
This adds documentation to the CCID 3 rx/tx socket fields, plus some
minor re-formatting.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:38 -08:00
Gerrit Renker
59348b19ef [DCCP]: Simplified conditions due to use of enum:8 states
This reaps the benefit of the earlier patch, which changed the type of
CCID 3 states to use enums, in that many conditions are now simplified
and the number of possible (unexpected) values is greatly reduced.

In a few instances, this also allowed to simplify pre-conditions; where
care has been taken to retain logical equivalence.

[DCCP]: Introduce a consistent BUG/WARN message scheme

This refines the existing set of DCCP messages so that
 * BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts
 * DCCP_CRIT (for severe warnings) is not rate-limited
 * DCCP_WARN() is introduced as rate-limited wrapper

Using these allows a faster and cleaner transition to their original
counterparts once the code has matured into a full DCCP implementation.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:38 -08:00
Gerrit Renker
56724aa434 [DCCP]: Add CCID3 debug support to Kconfig
This adds a CCID3 debug option to the configuration menu
which is missing in Kconfig, but already used by the code.

CCID 2 already provides such an entry.

To enable debugging, set CONFIG_IP_DCCP_CCID3_DEBUG=y

NOTE: The use of ccid3_{t,r}x_state_name is safe, since
      now only enum values can appear.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:36 -08:00
Gerrit Renker
84116716cc [DCCP]: enable debug messages also for static builds
This patch
  * makes debugging (when configured) work both for static / module build
  * provides generic debugging macros for use in other DCCP / CCID modules
  * adds missing information about debug parameters to Kconfig
  * performs some code tidy-up

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:35 -08:00
Andrea Bittau
32aac18dfa [DCCP] CCID2: Code optimizations
These are code optimizations which are relevant when dealing with large
windows.  They are not coded the way I would like to, but they do the job for
the short-term.  This patch should be more neat.

Commiter note: Changed the seqno comparisions to use {after,before}48 to handle
               wrapping.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:23:52 -08:00
Gerrit Renker
3c6952624a [DCCP]: Introduce DCCP_{BUG{_ON},CRIT} macros, use enum:8 for the ccid3 states
This patch tackles the following problem:
       * the ccid3_hc_{t,r}x_sock define ccid3hc{t,r}x_state as `u8', but
         in reality there can only be a few, pre-defined enum names
       * this necessitates addiditional checking for unexpected values
         which would otherwise be caught by the compiler

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:23:49 -08:00
Randy Dunlap
234af48401 [DCCP]: fix printk format warnings
Fix printk format warnings:
build2.out:net/dccp/ccids/ccid2.c:355: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:360: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:482: warning: long long unsigned int format, u64 arg (arg 5)
build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 4)
build2.out:net/dccp/ccids/ccid2.c:674: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:720: warning: long long unsigned int format, u64 arg (arg 3)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-30 15:24:37 -08:00
Gerrit Renker
0e64e94e47 [DCCP]: Update documentation references.
Updates the references to spec documents throughout the code, taking into
account that

* the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year

* RFC 1063 was obsoleted by RFC 1191

* draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational
  RFC, RFC 2923 on 2000-09-22.

All references verified.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-24 16:17:51 -07:00
Ian McDonald
3dd9a7c3a1 [DCCP]: Use constants for CCIDs
With constants for CCID numbers this now uses them in some places.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-09-24 18:03:41 -03:00
Andrea Bittau
593f16aa62 [DCCP] CCID2: Add helper functions for changing important CCID2 state
Introduce methods which manipulate interesting congestion control
state such as pipe and rtt estimate.  This is useful for people
wishing to monitor the variables of CCID and instrument the code
[perhaps using Kprobes].  Personally, I am a fan of
encapsulation---that justifies this change =D.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:42 -07:00
Andrea Bittau
374bcf32c8 [DCCP] CCID2: Halve cwnd once upon multiple losses in a single RTT
When multiple losses occur in one RTT, the window should be halved
only once [a single "congestion event"].  This is now implemented,
although not perfectly.  Slightly changed the interface for changing
the cwnd: pass hctx instead of dp.  This is required in order to allow
for change_cwnd to be called from _init().

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:41 -07:00
Andrea Bittau
07978aabd5 [DCCP] CCID2: Allocate seq records on demand
Allocate more sequence state on demand.  Each time a packet is sent
out by CCID2, a record of it needs to be kept.  This list of records
grows proportionally to cwnd.  Previously, the length of this list was
hardcored and therefore the cwnd could only grow to this value (of
128).  Now, records are allocated on demand as necessary---cwnd may
grow as it wishes.  The exceptional case of when memory is not
available is not handled gracefully.  Perhaps, cwnd should be capped
at that point.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:40 -07:00
Andrea Bittau
8d424f6ca2 [DCCP] CCID2: Add Kconfig option for CCID2 debug
Allow the user to choose whether or not to enable CCID2 debugging via
Kconfig.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:39 -07:00
Andrea Bittau
446dec30c7 [DCCP] CCID2: Tell DCCP to quickly check whether cwnd is available
If not enough cwnd is available, tell the sender to check again as
soon as possible.  This will increase CPU utilization (polling
frequently for cwnd) but will improve network performance.  That is,
the sender will need to wait less before detecting the increase of
cwnd.  A better architecture would be for the CCID to call-back (or
dequeue) from DCCP when it is able to transmit traffic -- not the
other way around as it currently occurs.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:39 -07:00
Andrea Bittau
d458c25ce2 [DCCP] CCID2: Initialize ssthresh to infinity
Initialize the slow-start threshold to infinity.  This way, upon connection
initiation, slow-start will be exited only upon a packet loss.  This patch will
allow connections to quickly gain speed.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:11 -07:00
Andrea Bittau
29651cda97 [DCCP] CCID2: Fix jiffie wrap issues
Jiffies are now handled correctly (I hope) in CCID2.  If they wrap, no
problem.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:10 -07:00
Andrea Bittau
8e27e4650c [DCCP] ackvec: Fix how DCCP_ACKVEC_STATE_NOT_RECEIVED is used
Fix the way state is masked out.  DCCP_ACKVEC_STATE_NOT_RECEIVED is
defined as appears in the packet, therefore bit shifting is not
required.  This fix allows CCID2 to correctly detect losses.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:08 -07:00
Ian McDonald
fc747e82b4 [DCCP]: Tidyup CCID3 list handling
As Arnaldo Carvalho de Melo points out I should be using list_entry in case
the structure changes in future. Current code functions but is reliant
on position and requires type cast.

Noticed when doing this that I have one more variable than I needed so
removing that also.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:33 -07:00
Ian McDonald
66a377c504 [DCCP]: Fix CCID3
This fixes CCID3 to give much closer performance to RFC4342.

CCID3 is meant to alter sending rate based on RTT and loss.

The performance was verified against:
http://wand.net.nz/~perry/max_download.php

For example I tested with netem and had the following parameters:
Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%.

This gives a theoretical speed of 71.9 Kbits/s. I measured across three
runs with this patch set and got 70.1 Kbits/s. Without this patchset the
average was 232 Kbits/s which means Linux can't be used for CCID3 research
properly.

I also tested with netem turned off so box just acting as router with 1.2
msec RTT. The performance with this is the same with or without the patch
at around 30 Mbit/s.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-26 23:40:50 -07:00
Ian McDonald
80193aee18 [DCCP]: Introduce dccp_rx_hist_find_entry
This adds a new function dccp_rx_hist_find_entry.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-26 19:07:36 -07:00
Ian McDonald
e6bccd3573 [DCCP]: Update contact details and copyright
Just updating copyright and contacts

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-26 19:01:30 -07:00
Ian McDonald
f3166c0717 [DCCP]: Fix typo
This fixes a small typo in net/dccp/libs/packet_history.c

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-26 19:01:03 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Arnaldo Carvalho de Melo
2d0817d11e [DCCP] options: Make dccp_insert_options & friends yell on error
And not the silly LIMIT_NETDEBUG and silently return without inserting
the option requested.

Also drop some old debugging messages associated to option insertion.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:32:06 -08:00
Arnaldo Carvalho de Melo
c0c736db7e [DCCP] ccid2: coding style cleanups
No changes in the logic where made.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:05:37 -08:00
Arnaldo Carvalho de Melo
7247887357 [DCCP] ipv6: Add missing ipv6 control socket
I guess I forgot to add it, nah, now it just works:

18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0)
18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code)

Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both
IPv6 and IPv4, so I'll come up with another way for freeing the
control sockets in upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:00:37 -08:00
Arnaldo Carvalho de Melo
c25a18ba34 [DCCP]: Uninline some functions
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:58:56 -08:00
Arnaldo Carvalho de Melo
057fc6755a [DCCP]: Kconfig tidy up
Make CCID2 and CCID3 default to what was selected for DCCP and use the
standard short description for the CCIDs (TCP-Like & TCP-Friendly).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:24:22 -08:00
Andrea Bittau
60fe62e789 [DCCP]: sparse endianness annotations
This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:23:32 -08:00
Arnaldo Carvalho de Melo
91f0ebf7b6 [DCCP] CCID: Improve CCID infrastructure
1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
   does and anynways neither ccid2 nor ccid3 were using it.

2. Rename struct ccid to struct ccid_operations and introduce struct ccid
   with a pointer to ccid_operations and rigth after it the rx or tx
   private state.

3. Remove the pointer to the state of the half connections from struct
   dccp_sock, now its derived thru ccid_priv() from the ccid pointer.

Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:21:44 -08:00
Andrea Bittau
77ff72d528 [DCCP] CCID2: Drop sock reference count on timer expiration and reset.
There was a hybrid use of standard timers and sk_timers.  This caused
the reference count of the sock to be incorrect when resetting the RTO
timer.  The sock reference count should now be correct, enabling its
destruction, and allowing the DCCP module to be unloaded.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-03-20 17:57:52 -08:00
Andrea Bittau
2a91aa3967 [DCCP] CCID2: Initial CCID2 (TCP-Like) implementation
Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several
issues on the merge process.

For now CCID2 was turned the default for all SOCK_DCCP connections, but this
will be remedied soon with the merge of the feature negotiation code.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:41:47 -08:00
Arnaldo Carvalho de Melo
aa5d7df3b2 [DCCP] CCID3: Set the no_feedback_timer fields near init_timer
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:35:13 -08:00
Arnaldo Carvalho de Melo
411447019a [DCCP] CCID: Allow ccid_{init,exit} to be NULL
Testing if the ccid being instantiated has these methods in
ccid_init().

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:20:23 -08:00
Ian McDonald
c09966608d [DCCP] ccid3: Divide by zero fix
In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which
leads to a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check
for zero return now. Update copyright notice at same time.

Found by Arnaldo.

Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-04 21:06:29 -08:00
Al Viro
1b8623545b [PATCH] remove bogus asm/bug.h includes.
A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early).  Removed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07 20:56:35 -05:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Arnaldo Carvalho de Melo
88f964db6e [DCCP]: Introduce CCID getsockopt for the CCIDs
Allocation for the optnames is similar to the DCCP options, with a
range for rx and tx half connection CCIDs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18 00:19:32 -07:00
Arnaldo Carvalho de Melo
65299d6c3c [CCID3]: Introduce include/linux/tfrc.h
Moving the TFRC sender and receiver variables to separate structs, so
that we can copy these structs to userspace thru getsockopt,
dccp_diag, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18 00:18:32 -07:00
Arnaldo Carvalho de Melo
59c2353dd0 [CCID3]: Listen socks doesn't have a private CCID block
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-12 14:16:58 -07:00
Arnaldo Carvalho de Melo
59d203f9e9 [CCID3] Cleanup ccid3 debug calls
Also use some BUG_ON where appropriate and use LIMIT_NETDEBUG for the unlikely
cases where we, at this stage, want to know about, that in my tests hasn't
appeared in the radar.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 20:01:25 -03:00
Arnaldo Carvalho de Melo
d7e0fb985c [CCID3] Initialize ccid3hctx_t_ipi to 250ms
To match more closely what is described in RFC 3448.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
2005-09-09 19:58:18 -03:00
Arnaldo Carvalho de Melo
59725dc2a2 [CCID3] Introduce ccid3_hc_[rt]x_sk() for overal consistency
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:40:58 -03:00
Arnaldo Carvalho de Melo
b0e567806d [DCCP] Introduce dccp_timestamp
To start the timestamps with 0.0ms, easing the integer maths in the CCIDs, this
probably will be reworked to use the to be introduced struct timeval_offset
infrastructure out of skb_get_timestamp, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:38:35 -03:00
Arnaldo Carvalho de Melo
954ee31f36 [CCID3] Initialize more fields in ccid3_hc_rx_init
The initialization of ccid3hcrx_rtt to 5ms is just a bandaid, I'll continue
auditing the CCID3 HC rx codebase to fix this properly, probably I'll add a
feedback timer as suggested in the CCID3 draft.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:37:05 -03:00
Arnaldo Carvalho de Melo
b3a3077d96 [CCID3] Make the ccid3hcrx_rtt calc look more like the ccid3hctx_rtt one
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:34:10 -03:00
Arnaldo Carvalho de Melo
1a28599a2c [CCID3] Use ELAPSED_TIME in the HC TX RTT estimation
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:32:56 -03:00
Arnaldo Carvalho de Melo
27ae543e6f [CCID3] Calculate ccid3hcrx_x_recv using usecs_div
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:31:07 -03:00
Arnaldo Carvalho de Melo
507d37cf26 [CCID] Only call the HC insert_options methods when requested
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:30:07 -03:00
Arnaldo Carvalho de Melo
0ba7a3ba66 [CCID3] Avoid unsigned integer overflows in usecs_div
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:28:47 -03:00
Arnaldo Carvalho de Melo
c530cfb1ce [CCID3]: Call sk->sk_write_space(sk) when receiving a feedback packet
This makes the send rate calculations behave way more closely to what
is specified, with the jitter previously seen on x and x_recv
disappearing completely on non lossy setups.

This resembles the tcp_data_snd_check code, that possibly we'll end up
using in DCCP as well, perhaps moving this code to
inet_connection_sock.

For now I'm doing the simplest implementation tho.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:13:46 -07:00
Arnaldo Carvalho de Melo
a84ffe4303 [DCCP]: Introduce DCCP_SOCKOPT_PACKET_SIZE
So that applications can set dccp_sock->dccps_pkt_size, that in turn
is used in the CCID3 half connection init routines to set
ccid3hc[tr]x_s and use it in its rate calculations.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:13:37 -07:00
Arnaldo Carvalho de Melo
29e4f8b3c3 [CCID3]: Move ccid3_hc_rx_detect_loss to packet_history.c
Renaming it to dccp_rx_hist_detect_loss.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:13:17 -07:00
Arnaldo Carvalho de Melo
072ab6c68e [CCID3]: Move ccid3_hc_rx_add_hist to packet_history.c
Renaming it to dccp_rx_hist_add_packet.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:13:10 -07:00
Arnaldo Carvalho de Melo
36729c1a73 [DCCP]: Move the calc_X routines to dccp_tfrc_lib
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:12:47 -07:00
Arnaldo Carvalho de Melo
5cea0ddce5 [DCCP]: Introduce dccp_tfrc_lib module with net/dccp/ccids/lib/*.c
I'll now take a look at the other proposed TFRC DCCP CCIDs to find
more code that is now in ccid3.c and move to this module, the loss
event rate, calc_X, etc most probably will be moved there.

The main goal of these changes is to pave the way for the
implementation of more TFRC based DCCP CCIDs and to shrink ccid3.c,
reducing its complexity and helping in getting it rock solid.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:12:33 -07:00
Arnaldo Carvalho de Melo
4524b25954 [DCCP]: Just move packet_history.[ch] to net/dccp/ccids/lib/
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:12:25 -07:00
Arnaldo Carvalho de Melo
ae6706f067 [CCID3]: Move the loss interval code to loss_interval.[ch]
And put this into net/dccp/ccids/lib/, where packet_history.[ch] will also be
moved and then we'll have a tfrc_lib.ko module that will be used by
dccp_ccid3.ko and other CCIDs that are variations of TFRC (RFC 3448).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:12:17 -07:00
Arnaldo Carvalho de Melo
cfc3c525a3 [CCID3]: Move the CCID3 defines to ccid3.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:12:10 -07:00
Arnaldo Carvalho de Melo
6b5e633ab1 [CCID3]: Introduce usecs_div
To avoid open coding this all over the place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:12:03 -07:00
Arnaldo Carvalho de Melo
b6ee3d4ada [CCID3]: Reorganise timeval handling
Introducing functions to add to or subtract from a timeval variable
and renaming now_delta to timeval_new_delta that calls do_gettimeofday
and then timeval_delta, that should be used when there are several
deltas made relative to the current time or setting variables to it,
so as to avoid calling do_gettimeofday excessively.

I'm leaving these "timeval_" prefixed funcions internal to DCCP for a
while till we're sure there are no subtle bugs in it.

It also is more correct as it checks if the number of usecs added to
or subtracted from a tv_usec field is more than 2 seconds.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:11:56 -07:00
Arnaldo Carvalho de Melo
1f2333aea3 [CCID3]: Reflow to mostly fit under 80 columns
No code changes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:11:46 -07:00
Arnaldo Carvalho de Melo
d6809c12b3 [DCCP]: Introduce dccp_wait_for_ccid and use it in dccp_write_xmit
This is not quite what I think we should have long term but improves
performance for now, so lets use it till we get CCID3 working well,
then we can think about using sk_write_queue, perhaps using some ideas
from Juwen Lai's old stack for 2.4.20.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:11:38 -07:00
Eric Dumazet
ba89966c19 [NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers
This patch puts mostly read only data in the right section
(read_mostly), to help sharing of these data between CPUS without
memory ping pongs.

On one of my production machine, tcp_statistics was sitting in a
heavily modified cache line, so *every* SNMP update had to force a
reload.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:11:18 -07:00
Arnaldo Carvalho de Melo
2babe1f6fe [DCCP]: Introduce dccp_get_info
And also hc_tx and hc_rx get_info functions for the CCIDs to fill in
information that is specific to them.

For now reusing struct tcp_info, later I'll try to figure out a better
solution, for now its really nice to get this kind of info:

[root@qemu ~]# ./ss -danemi
State       Recv-Q Send-Q  Local Addr:Port  Peer Addr:Port
LISTEN      0      0                *:5001          *:*     ino:628 sk:c1340040
         mem:(r0,w0,f0,t0) cwnd:0 ssthresh:0
ESTAB       0      0       172.20.0.2:5001 172.20.0.1:32785 ino:629 sk:c13409a0
         mem:(r0,w0,f0,t0) ts rto:1000 rtt:0.004/0 cwnd:0 ssthresh:0 rcv_rtt:61.377

This, for instance, shows that we're not congestion controlling ACKs,
as the above output is in the ttcp receiving host, and ttcp is a one
way app, i.e. the received never calls sendmsg, so
ccid_hc_tx_send_packet is never called, so the TX half connection
stays in TFRC_SSTATE_NO_SENT state and hctx_rtt is never calculated,
stays with the value set in ccid3_hc_tx_init, 4us, as show above in
milliseconds (0.004ms), upcoming patches will fix this.

rcv_rtt seems sane tho, matching ping results :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:05:07 -07:00
Arnaldo Carvalho de Melo
4fded33b3e [CCID3]: Calculate the RTT in the RX half connection
Using TIMESTAMP_ECHO and ELAPSED_TIME options received.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:05:01 -07:00
Arnaldo Carvalho de Melo
c68e64cfb5 [CCID3]: Reintroduce ccid3hctx_t_rto
CCID3 keeps this variable in usecs, inet_connection_socks in jiffies,
so to avoid Mars orbiter losses lets reintroduce ccid3hctx_t_rto 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:03:18 -07:00
Ian McDonald
1bc0986957 [DCCP]: Fix the timestamp options
This changes timestamp, timestamp echo, and elapsed time to use units of 10
usecs as per DCCP spec. This has been tested to verify that times are correct.
Also fixed up length and used hton/ntoh more.

Still to add in later patches:
- actually use elapsed time to adjust RTT
(commented out as was prior to this patch)
- send options at times more closely following the spec
(content is now correct)

Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:02:34 -07:00
Patrick McHardy
a10cedd4b9 [DCCP]: Fix compiler warnings
may be a false warning if there always is something on ccid3hcrx_hist:

net/dccp/ccids/ccid3.c: In function 'ccid3_hc_rx_packet_recv':
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_usec' may be used uninitialized in this function
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_sec' may be used uninitialized in this function

const on inline functions doesn't have any effect:

net/dccp/dccp.h:64: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:70: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:76: warning: type qualifiers ignored on function return type

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:00:12 -07:00
Arnaldo Carvalho de Melo
a1d3a35518 [DCCP]: Fix sparse warnings
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:59:59 -07:00
Arnaldo Carvalho de Melo
725ba8eee3 [DCCP]: Introduce the DCCP Kernel hacking menu
Only available if CONFIG_DEBUG_KERNEL is enabled in the "Kernel
Hacking" Menu.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:59:43 -07:00
Arnaldo Carvalho de Melo
c173437669 [PACKET_HISTORY]: Add dccphtx_rtt and rename the win_count fields
As requested by Ian.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:59:17 -07:00
Arnaldo Carvalho de Melo
cef07fd602 [CCID3]: Ditch USEC_IN_SEC as time.h has USEC_PER_SEC
That is equivalent, no need to have a private one.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-08-29 15:56:33 -07:00
Arnaldo Carvalho de Melo
8c60f3fab5 [CCID3]: Separate most of the packet history code
This also changes the list_for_each_entry_safe_continue behaviour to match its
kerneldoc comment, that is, to start after the pos passed.

Also adds several helper functions from previously open coded fragments, making
the code more clear.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-08-29 15:56:28 -07:00
Arnaldo Carvalho de Melo
27258ee54f [DCCP]: Introduce dccp_write_xmit from code in dccp_sendmsg
This way it gets closer to the TCP flow, where congestion window
checks are done, it seems we can map ccid_hc_tx_send_packet in
dccp_write_xmit to tcp_snd_wnd_test in tcp_write_xmit, a CCID2
decision should just fit in here as well...

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:55:18 -07:00
Arnaldo Carvalho de Melo
757f612e09 [CCID3]: Reenable list_for_each_entry_safe_continue usage
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:50:08 -07:00
Arnaldo Carvalho de Melo
7c657876b6 [DCCP]: Initial implementation
Development to this point was done on a subversion repository at:

http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/

This repository will be kept at this site for the foreseable future,
so that interested parties can see the history of this code,
attributions, etc.

If I ever decide to take this offline I'll provide the full history at
some other suitable place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:46 -07:00