linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-14 06:24:53 +08:00

Author	SHA1	Message	Date
Ilpo Järvinen	90840defab	[TCP]: Introduce tcp_wnd_end() to reduce line lengths Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:22 -08:00
Ilpo Järvinen	3ccd3130b3	[TCP]: Make invariant check complain about invalid sacked_out Earlier resolution for NewReno's sacked_out should now keep it small enough for this to become invariant-like check. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:20 -08:00
Hideo Aoki	3ab224be6d	[NET] CORE: Introducing new memory accounting interface. This patch introduces new memory accounting functions for each network protocol. Most of them are renamed from memory accounting functions for stream protocols. At the same time, some stream memory accounting functions are removed since other functions do same thing. Renaming: sk_stream_free_skb() -> sk_wmem_free_skb() __sk_stream_mem_reclaim() -> __sk_mem_reclaim() sk_stream_mem_reclaim() -> sk_mem_reclaim() sk_stream_mem_schedule -> __sk_mem_schedule() sk_stream_pages() -> sk_mem_pages() sk_stream_rmem_schedule() -> sk_rmem_schedule() sk_stream_wmem_schedule() -> sk_wmem_schedule() sk_charge_skb() -> sk_mem_charge() Removeing sk_stream_rfree(): consolidates into sock_rfree() sk_stream_set_owner_r(): consolidates into skb_set_owner_r() sk_stream_mem_schedule() The following functions are added. sk_has_account(): check if the protocol supports accounting sk_mem_uncharge(): do the opposite of sk_mem_charge() In addition, to achieve consolidation, updating sk_wmem_queued is removed from sk_mem_charge(). Next, to consolidate memory accounting functions, this patch adds memory accounting calls to network core functions. Moreover, present memory accounting call is renamed to new accounting call. Finally we replace present memory accounting calls with new interface in TCP and SCTP. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:18 -08:00
Ilpo Järvinen	c776ee01bd	[TCP]: Remove seq_rtt ptr from clean_rtx_queue args While checking Gavin's patch I noticed that the returned seq_rtt is not used by the caller. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:07 -08:00
Eric Dumazet	dfd4f0ae2e	[TCP]: Avoid two divides in __tcp_grow_window() tcp_win_from_space() being signed, compiler might emit an integer divide to compute tcp_win_from_space()/2 . Using right shifts is OK here and less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:01 -08:00
Ilpo Järvinen	6859d49475	[TCP]: Abstract tp->highest_sack accessing & point to next skb Pointing to the next skb is necessary to avoid referencing already SACKed skbs which will soon be on a separate list. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:46 -08:00
Ilpo Järvinen	7201883599	[TCP]: Cleanup local variables of clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:46 -08:00
Ilpo Järvinen	ea60658cde	[TCP]: Add unlikely() to urgent handling in clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:45 -08:00
Ilpo Järvinen	89d478f7f2	[TCP]: Remove duplicated code block from clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:44 -08:00
Ilpo Järvinen	c3a05c6050	[TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:41 -08:00
Ilpo Järvinen	ede9f3b186	[TCP]: Unite identical code from two seqno split blocks Bogus seqno compares just mislead, the code is identical for both sides of the seqno compare (and was even executed just once because of return in between). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:41 -08:00
Ilpo Järvinen	407ef1de03	[TCP]: Remove superflucious FLAG_DATA_SACKED To get there, highest_sack must have advanced. When it advances, a new skb is SACKed, which already sets that FLAG. Besides, the original purpose of it has puzzled me, never understood why LOST bit setting of retransmitted skb is marked with FLAG_DATA_SACKED. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:40 -08:00
Ilpo Järvinen	bce392f3b0	[TCP]: Move LOSTRETRANS MIB outside !(L\|S) check Usually those skbs will have L set, not counting them as lost retransmissions is misleading. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:39 -08:00
Ilpo Järvinen	ea4f76ae13	[TCP]: Two fixes to new sacktag code 1) Skip condition used to be wrong way around which made SACK processing very broken, missed many blocks because of that. 2) Use highest_sack advancement only if some skbs are already sacked because otherwise tcp_write_queue_next may move things too far (occurs mainly with GSO). The other similar advancement is not problem because highest_sack was previosly put to point a sacked skb. These problems were located because of problem report from Matt Mathis <mathis@psc.edu>. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:10 -08:00
Pavel Emelyanov	8d8ad9d7c4	[NET]: Name magic constants in sock_wake_async() The sock_wake_async() performs a bit different actions depending on "how" argument. Unfortunately this argument ony has numerical magic values. I propose to give names to their constants to help people reading this function callers understand what's going on without looking into this function all the time. I suppose this is 2.6.25 material, but if it's not (or the naming seems poor/bad/awful), I can rework it against the current net-2.6 tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:03 -08:00
Ilpo Järvinen	20de20beba	[TCP]: Correct DSACK check placing Previously one of the in-block skip branches was missing it. Also, drop it from tail-fully-processed case because the next iteration will do exactly the same thing, i.e., process the SACK block that contains the DSACK information. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:15 -08:00
Ilpo Järvinen	68f8353b48	[TCP]: Rewrite SACK block processing & sack_recv_cache use Key points of this patch are: - In case new SACK information is advance only type, no skb processing below previously discovered highest point is done - Optimize cases below highest point too since there's no need to always go up to highest point (which is very likely still present in that SACK), this is not entirely true though because I'm dropping the fastpath_skb_hint which could previously optimize those cases even better. Whether that's significant, I'm not too sure. Currently it will provide skipping by walking. Combined with RB-tree, all skipping would become fast too regardless of window size (can be done incrementally later). Previously a number of cases in TCP SACK processing fails to take advantage of costly stored information in sack_recv_cache, most importantly, expected events such as cumulative ACK and new hole ACKs. Processing on such ACKs result in rather long walks building up latencies (which easily gets nasty when window is huge). Those latencies are often completely unnecessary compared with the amount of _new_ information received, usually for cumulative ACK there's no new information at all, yet TCP walks whole queue unnecessary potentially taking a number of costly cache misses on the way, etc.! Since the inclusion of highest_sack, there's a lot information that is very likely redundant (SACK fastpath hint stuff, fackets_out, highest_sack), though there's no ultimate guarantee that they'll remain the same whole the time (in all unearthly scenarios). Take advantage of this knowledge here and drop fastpath hint and use direct access to highest SACKed skb as a replacement. Effectively "special cased" fastpath is dropped. This change adds some complexity to introduce better coveraged "fastpath", though the added complexity should make TCP behave more cache friendly. The current ACK's SACK blocks are compared against each cached block individially and only ranges that are new are then scanned by the high constant walk. For other parts of write queue, even when in previously known part of the SACK blocks, a faster skip function is used (if necessary at all). In addition, whenever possible, TCP fast-forwards to highest_sack skb that was made available by an earlier patch. In typical case, no other things but this fast-forward and mandatory markings after that occur making the access pattern quite similar to the former fastpath "special case". DSACKs are special case that must always be walked. The local to recv_sack_cache copying could be more intelligent w.r.t DSACKs which are likely to be there only once but that is left to a separate patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	fd6dad616d	[TCP]: Earlier SACK block verification & simplify access to them Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	9e10c47cb9	[TCP]: Create tcp_sacktag_one(). Worker function that implements the main logic of the inner-most loop of tcp_sacktag_write_queue(). Idea was originally presented by David S. Miller. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:06 -08:00
Ilpo Järvinen	b7d4815f35	[TCP]: Prior_fackets can be replaced by highest_sack seq Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:05 -08:00
Ilpo Järvinen	9f58f3b721	[TCP]: Make lost retrans detection more self-contained Highest_sack_end_seq is no longer calculated in the loop, thus it can be pushed to the worker function altogether making that function independent of the sacktag. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:04 -08:00
Ilpo Järvinen	a47e5a988a	[TCP]: Convert highest_sack to sk_buff to allow direct access It is going to replace the sack fastpath hint quite soon... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Ilpo Järvinen	85cc391c0e	[TCP]: non-FACK SACK follows conservative SACK loss recovery Many assumptions that are true when no reordering or other strange events happen are not a part of the RFC3517. FACK implementation is based on such assumptions. Previously (before the rewrite) the non-FACK SACK was basically doing fast rexmit and then it times out all skbs when first cumulative ACK arrives, which cannot really be called SACK based recovery :-). RFC3517 SACK disables these things: - Per SKB timeouts & head timeout entry to recovery - Marking at least one skb while in recovery (RFC3517 does this only for the fast retransmission but not for the other skbs when cumulative ACKs arrive in the recovery) - Sacktag's loss detection flavors B and C (see comment before tcp_sacktag_write_queue) This does not implement the "last resort" rule 3 of NextSeg, which allows retransmissions also when not enough SACK blocks have yet arrived above a segment for IsLost to return true [RFC3517]. The implementation differs from RFC3517 in these points: - Rate-halving is used instead of FlightSize / 2 - Instead of using dupACKs to trigger the recovery, the number of SACK blocks is used as FACK does with SACK blocks+holes (which provides more accurate number). It seems that the difference can affect negatively only if the receiver does not generate SACK blocks at all even though it claimed to be SACK-capable. - Dupthresh is not a constant one. Dynamical adjustments include both holes and sacked segments (equal to what FACK has) due to complexity involved in determining the number sacked blocks between highest_sack and the reordered segment. Thus it's will be an over-estimate. Implementation note: tcp_clean_rtx_queue doesn't need a lost_cnt tweak because head skb at that point cannot be SACKED_ACKED (nor would such situation last for long enough to cause problems). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Ilpo Järvinen	f577111302	[TCP]: Extend reordering detection to cover CA_Loss partially This implements more accurately what is stated in sacktag's overall comment: "Both of these heuristics are not used in Loss state, when we cannot account for retransmits accurately." When CA_Loss state is entered, the state changer ensures that undo_marker is only set if no TCPCB_RETRANS skbs were found, thus having non-zero undo_marker in CA_Loss basically tells that the R-bits still accurately reflect the current state of TCP. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:02 -08:00
Ilpo Järvinen	b9d86585dc	[TCP]: Move !in_sack test earlier in sacktag & reorganize if()s All intermediate conditions include it already, make them simpler as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:01 -08:00
Gavin McCullagh	2072c228c9	[TCP]: use non-delayed ACK for congestion control RTT When a delayed ACK representing two packets arrives, there are two RTT samples available, one for each packet. The first (in order of seq number) will be artificially long due to the delay waiting for the second packet, the second will trigger the ACK and so will not itself be delayed. According to rfc1323, the SRTT used for RTO calculation should use the first rtt, so receivers echo the timestamp from the first packet in the delayed ack. For congestion control however, it seems measuring delayed ack delay is not desirable as it varies independently of congestion. The patch below causes seq_rtt and last_ackt to be updated with any available later packet rtts which should have less (and hopefully zero) delack delay. The rtt value then gets passed to ca_ops->pkts_acked(). Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from within a TSO chunk (!fully_acked), using only the final ACK (which includes any TSO delay) to generate RTTs. This patch removes these checks so RTTs are passed for each ACK to ca_ops->pkts_acked(). For non-delay based congestion control (cubic, h-tcp), rtt is sometimes used for rtt-scaling. In shortening the RTT, this may make them a little less aggressive. Delay-based schemes (eg vegas, veno, illinois) should get a cleaner, more accurate congestion signal, particularly for small cwnds. The congestion control module can potentially also filter out bad RTTs due to the delayed ack alarm by looking at the associated cnt which (where delayed acking is in use) should probably be 1 if the alarm went off or greater if the ACK was triggered by a packet. Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie> Acked-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-29 19:11:21 -08:00
Satoru SATOH	488faa2ae3	[IPV4]: Make tcp_input_metrics() get minimum RTO via tcp_rto_min() tcp_input_metrics() refers to the built-time constant TCP_RTO_MIN regardless of configured minimum RTO with iproute2. Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-16 14:00:19 -08:00
Ilpo J�rvinen	52d3408150	[TCP]: Move prior_in_flight collect to more robust place The previous location is after sacktag processing, which affects counters tcp_packets_in_flight depends on. This may manifest as wrong behavior if new SACK blocks are present and all is clear for call to tcp_cong_avoid, which in the case of tcp_reno_cong_avoid bails out early because it thinks that TCP is not limited by cwnd. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-05 05:37:30 -08:00
Ilpo J�rvinen	3e6f049e0c	[TCP] FRTO: Use of existing funcs make code more obvious & robust Though there's little need for everything that tcp_may_send_now does (actually, even the state had to be adjusted to pass some checks FRTO does not want to occur), it's more robust to let it make the decision if sending is allowed. State adjustments needed: - Make sure snd_cwnd limit is not hit in there - Disable nagle (if necessary) through the frto_counter == 2 The result of check for frto_counter in argument to call for tcp_enter_frto_loss can just be open coded, therefore there isn't need to store the previous frto_counter past tcp_may_send_now. In addition, returns can then be combined. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-05 05:37:29 -08:00
Ilpo J�rvinen	e1cd8f78f8	[TCP] FRTO: Clear frto_highmark only after process_frto that uses it I broke this in commit `3de96471bd`: [TCP]: Wrap-safed reordering detection FRTO check tcp_process_frto should always see a valid frto_highmark. An invalid frto_highmark (zero) is very likely what ultimately caused a seqno compare in tcp_frto_enter_loss to do the wrong leading to the LOST-bit leak. Having LOST-bits integry ensured like done after commit `23aeeec365`: [TCP] FRTO: Plug potential LOST-bit leak won't hurt. It may still be useful in some other, possibly legimate, scenario. Reported by Chazarain Guillaume <guichaz@yahoo.fr>. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-14 15:55:09 -08:00
Ilpo J�rvinen	96a2d41a3e	[TCP]: Make sure write_queue_from does not begin with NULL ptr NULL ptr can be returned from tcp_write_queue_head to cached_skb and then assigned to skb if packets_out was zero. Without this, system is vulnerable to a carefully crafted ACKs which obviously is remotely triggerable. Besides, there's very little that needs to be done in sacktag if there weren't any packets outstanding, just skipping the rest doesn't hurt. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-14 15:47:18 -08:00
Ilpo J�rvinen	23aeeec365	[TCP] FRTO: Plug potential LOST-bit leak It might be possible that, in some extreme scenario that I just cannot now construct in my mind, end_seq <= frto_highmark check does not match causing the lost_out and LOST bits become out-of-sync due to clearing and recounting in the loop. This may fix LOST-bit leak reported by Chazarain Guillaume <guichaz@yahoo.fr>. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 21:03:13 -08:00
Ilpo J�rvinen	746aa32d28	[TCP] FRTO: Limit snd_cwnd if TCP was application limited Otherwise TCP might violate packet ordering principles that FRTO is based on. If conventional recovery path is chosen, this won't be significant at all. In practice, any small enough value will be sufficient to provide proper operation for FRTO, yet other users of snd_cwnd might benefit from a "close enough" value. FRTO's formula is now equal to what tcp_enter_cwr() uses. FRTO used to check application limitedness a bit differently but I changed that in commit `575ee7140d` and as a result checking for application limitedness became completely non-existing. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 21:01:23 -08:00
Ilpo J�rvinen	fbd52eb2bd	[TCP]: Split SACK FRTO flag clearing (fixes FRTO corner case bug) In case we run out of mem when fragmenting, the clearing of FLAG_ONLY_ORIG_SACKED might get missed which then feeds FRTO with false information. Move clearing outside skb processing loop so that it will get executed even if the skb loop terminates prematurely due to out-of-mem. Besides, now the core of the loop truly deals with a single skb only, which also enables creation a more self-contained of tcp_sacktag_one later on. In addition, small reorganization of if branches was made. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:24:19 -08:00
Ilpo J�rvinen	e49aa5d456	[TCP]: Add unlikely() to sacktag out-of-mem in fragment case Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:23:08 -08:00
Ilpo J�rvinen	c7caf8d3ed	[TCP]: Fix reord detection due to snd_una covered holes Fixes subtle bug like the one with fastpath_cnt_hint happening due to the way the GSO and hints interact. Because hints are not reset when just a GSOed skb is partially ACKed, there's no guarantee that the relevant part of the write queue is going to be processed in sacktag at all (skbs below snd_una) because fastpath hint can fast forward the entrypoint. This was also on the way of future reductions in sacktag's skb processing. Also future cleanups in sacktag can be made after this (in 2.6.25). This may make reordering update in tcp_try_undo_partial redundant but I'm not too sure so I left it there. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:22:18 -08:00
Ilpo J�rvinen	8dd71c5d28	[TCP]: Consider GSO while counting reord in sacktag Reordering detection fails to take account that the reordered skb may have pcount larger than 1. In such case the lowest of them had the largest reordering, the old formula used the highest of them which is pcount - 1 packets less reordered. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:20:59 -08:00
Ilpo J�rvinen	261ab365fa	[TCP]: Another TAGBITS -> SACKED_ACKED\|LOST conversion Similar to commit `3eec0047d9`, point of this is to avoid skipping R-bit skbs. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:10:18 -07:00
Ilpo J�rvinen	e56d6cd605	[TCP]: Process DSACKs that reside within a SACK block DSACK inside another SACK block were missed if start_seq of DSACK was larger than SACK block's because sorting prioritizes full processing of the SACK block before DSACK. After SACK block sorting situation is like this: SSSSSSSSS D SSSSSS SSSSSSS Because write_queue is walked in-order, when the first SACK block has been processed, TCP is already past the skb for which the DSACK arrived and we haven't taught it to backtrack (nor should we), so TCP just continues processing by going to the next SACK block after the DSACK (if any). Whenever such DSACK is present, do an embedded checking during the previous SACK block. If the DSACK is below snd_una, there won't be overlapping SACK block, and thus no problem in that case. Also if start_seq of the DSACK is equal to the actual block, it will be processed first. Tested this by using netem to duplicate 15% of packets, and by printing SACK block when found_dup_sack is true and the selected skb in the dup_sack = 1 branch (if taken): SACK block 0: 4344-5792 (relative to snd_una 2019137317) SACK block 1: 4344-5792 (relative to snd_una 2019137317) equal start seqnos => next_dup = 0, dup_sack = 1 won't occur... SACK block 0: 5792-7240 (relative to snd_una 2019214061) SACK block 1: 2896-7240 (relative to snd_una 2019214061) DSACK skb match 5792-7240 (relative to snd_una) ...and next_dup = 1 case (after the not shown start_seq sort), went to dup_sack = 1 branch. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:09:37 -07:00
Ryousei Takano	94d3b1e586	[TCP]: fix D-SACK cwnd handling In the current net-2.6 kernel, handling FLAG_DSACKING_ACK is broken. The flag is cleared to 1 just after FLAG_DSACKING_ACK is set. if (found_dup_sack) flag \|= FLAG_DSACKING_ACK; : flag = 1; To fix it, this patch introduces a part of the tcp_sacktag_state patch: http://marc.info/?l=linux-netdev&m=119210560431519&w=2 Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 04:27:59 -07:00
Adrian Bunk	0f79efdc23	[TCP]: Make tcp_match_skb_to_sack() static. tcp_match_skb_to_sack() can become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 03:57:36 -07:00
Ryousei Takano	564262c1f0	[TCP]: Fix inconsistency of terms. Fix inconsistency of terms: 1) D-SACK 2) F-RTO Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-25 23:03:52 -07:00
Chuck Lever	c2636b4d9e	[NET]: Treat the sign of the result of skb_headroom() consistently In some places, the result of skb_headroom() is compared to an unsigned integer, and in others, the result is compared to a signed integer. Make the comparisons consistent and correct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:55 -07:00
Ilpo Järvinen	df2e014bfb	[TCP]: Remove lost_retrans zero seqno special cases Both high-sack detection and new lowest seq variables have unnecessary zero special case which are now removed by setting safe initial seqnos. This also fixes problem which caused zero received_upto being passed to tcp_mark_lost_retrans which confused after relations within the marker loop causing incorrect TCPCB_SACKED_RETRANS clearing. The problem was noticed because of a performance report from TAKANO Ryousei <takano@axe-inc.co.jp>. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Ryousei Takano <takano-ryousei@aist.go.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 05:07:57 -07:00
Ilpo Järvinen	f885c5b08e	[TCP]: high_seq parameter removed (all callers use tp->high_seq) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:37 -07:00
Ilpo Järvinen	b08d6cb22c	[TCP]: Limit processing lost_retrans loop to work-to-do cases This addition of lost_retrans_low to tcp_sock might be unnecessary, it's not clear how often lost_retrans worker is executed when there wasn't work to do. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:36:13 -07:00
Ilpo Järvinen	f785a8e28b	[TCP]: Fix lost_retrans loop vs fastpath problems Detection implemented with lost_retrans must work also when fastpath is taken, yet most of the queue is skipped including (very likely) those retransmitted skb's we're interested in. This problem appeared when the hints got added, which removed a need to always walk over the whole write queue head. Therefore decicion for the lost_retrans worker loop entry must be separated from the sacktag processing more than it was necessary before. It turns out to be problematic to optimize the worker loop very heavily because ack_seqs of skb may have a number of discontinuity points. Maybe similar approach as currently is implemented could be attempted but that's becoming more and more complex because the trend is towards less skb walking in sacktag marker. Trying a simple work until all rexmitted skbs heve been processed approach. Maybe after(highest_sack_end_seq, tp->high_seq) checking is not sufficiently accurate and causes entry too often in no-work-to-do cases. Since that's not known, I've separated solution to that from this patch. Noticed because of report against a related problem from TAKANO Ryousei <takano@axe-inc.co.jp>. He also provided a patch to that part of the problem. This patch includes solution to it (though this patch has to use somewhat different placement). TAKANO's description and patch is available here: http://marc.info/?l=linux-netdev&m=119149311913288&w=2 ...In short, TAKANO's problem is that end_seq the loop is using not necessarily the largest SACK block's end_seq because the current ACK may still have higher SACK blocks which are later by the loop. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:35:41 -07:00
Ilpo Järvinen	4cd829995b	[TCP]: No need to re-count fackets_out/sacked_out at RTO Both sacked_out and fackets_out are directly known from how parameter. Since fackets_out is accurate, there's no need for recounting (sacked_out was previously unnecessarily counted in the loop anyway). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:34:57 -07:00
Ilpo Järvinen	d193594299	[TCP]: Extract tcp_match_queue_to_sack from sacktag code This is necessary for upcoming DSACK bugfix. Reduces sacktag length which is not very sad thing at all... :-) Notice that there's a need to handle out-of-mem at caller's place. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:34:25 -07:00
Ilpo Järvinen	f6fb128d27	[TCP]: Kill almost unused variable pcount from sacktag It's on the way for future cutting of that function. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:33:55 -07:00
Ilpo Järvinen	3eec0047d9	[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L This condition (plain R) can arise at least in recovery that is triggered after tcp_undo_loss. There isn't any reason why they should not be marked as lost, not marking makes in_flight estimator to return too large values. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:33:11 -07:00
Ilpo Järvinen	16e906812f	[TCP]: Add bytes_acked (ABC) clearing to FRTO too I was reading tcp_enter_loss while looking for Cedric's bug and noticed bytes_acked adjustment is missing from FRTO side. Since bytes_acked will only be used in tcp_cong_avoid, I think it's safe to assume RTO would be spurious. During FRTO cwnd will be not controlled by tcp_cong_avoid and if FRTO calls for conventional recovery, cwnd is adjusted and the result of wrong assumption is cleared from bytes_acked. If RTO was in fact spurious, we did normal ABC already and can continue without any additional adjustments. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:32:31 -07:00
Ilpo Järvinen	1c1e87edb9	[TCP]: Separate lost_retrans loop into own function Follows own function for each task principle, this is really somewhat separate task being done in sacktag. Also reduces indentation. In addition, added ack_seq local var to break some long lines & fixed coding style things. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:51 -07:00
Stephen Hemminger	cfcabdcc2d	[NET]: sparse warning fixes Fix a bunch of sparse warnings. Mostly about 0 used as NULL pointer, and shadowed variable declarations. One notable case was that hash size should have been unsigned. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:48 -07:00
Ilpo Järvinen	de83c058af	[TCP]: "Annotate" another fackets_out state reset This should no longer be necessary because fackets_out is accurate. It indicates bugs elsewhere, thus report it. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:48 -07:00
Ilpo Järvinen	3de96471bd	[TCP]: Wrap-safed reordering detection FRTO check In case somebody has a suggestion about a better place for this check, which must guarantee execution "early enough" (i.e, before the wrap can occur), I'm very open to them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:00 -07:00
Ilpo Järvinen	0e835331e3	[TCP]: Update comment of SACK block validator Just came across what RFC2018 states about generation of valid SACK blocks in case of reneging. Alter comment a bit to point out clearly. IMHO, there isn't any reason to change code because the validation is there for a purpose (counters will inform user about decision TCP made if this case ever surfaces). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:59 -07:00
Ilpo Järvinen	95eacd27e2	[TCP]: fix comments that got messed up during code move Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:59 -07:00
Ilpo Järvinen	912d8f0b1f	[TCP] MIB: Count FRTO's successfully detected spurious RTOs Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:39 -07:00
Ilpo Järvinen	93e6802029	[TCP]: Reordered ACK's (old) SACKs not included to discarded MIB In case of ACK reordering, the SACK block might be valid in it's time but is already obsoleted since we've received another kind of confirmation about arrival of the segments through snd_una advancement of an earlier packet. I didn't bother to build distinguishing of valid and invalid SACK blocks but simply made reordered SACK blocks that are too old always not counted regardless of their "real" validity which could be determined by using the ack field of the reordered packet (won't be significant IMHO). DSACKs can very well be considered useful even in this situation, so won't do any of this for them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:38 -07:00
Ilpo Järvinen	b76892051c	[TCP]: Avoid clearing sacktag hint in trivial situations There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:12 -07:00
Ilpo Järvinen	c96fd3d461	[TCP]: Enable SACK enhanced FRTO (RFC4138) by default Most of the description that follows comes from my mail to netdev (some editing done): Main obstacle to FRTO use is its deployment as it has to be on the sender side where as wireless link is often the receiver's access link. Take initiative on behalf of unlucky receivers and enable it by default in future Linux TCP senders. Also IETF seems to interested in advancing FRTO from experimental [1]. How does FRTO help? =================== FRTO detects spurious RTOs and avoids a number of unnecessary retransmissions and a couple of other problems that can arise due to incorrect guess made at RTO (i.e., that segments were lost when they actually got delayed which is likely to occur e.g. in wireless environments with link-layer retransmission). Though FRTO cannot prevent the first (potentially unnecessary) retransmission at RTO, I suspect that it won't cost that much even if you have to pay for each bit (won't be that high percentage out of all packets after all :-)). However, usually when you have a spurious RTO, not only the first segment unnecessarily retransmitted but the whole window. It goes like this: all cumulative ACKs got delayed due to in-order delivery, then TCP will actually send 1.5*original cwnd worth of data in the RTO's slow-start when the delayed ACKs arrive (basically the original cwnd worth of it unnecessarily). In case one is interested in minimizing unnecessary retransmissions e.g. due to cost, those rexmissions must never see daylight. Besides, in the worst case the generated burst overloads the bottleneck buffers which is likely to significantly delay the further progress of the flow. In case of ll rexmissions, ACK compression often occurs at the same time making the burst very "sharp edged" (in that case TCP often loses most of the segments above high_seq => very bad performance too). When FRTO is enabled, those unnecessary retransmissions are fully avoided except for the first segment and the cwnd behavior after detected spurious RTO is determined by the response (one can tune that by sysctl). Basic version (non-SACK enhanced one), FRTO can fail to detect spurious RTO as spurious and falls back to conservative behavior. ACK lossage is much less significant than reordering, usually the FRTO can detect spurious RTO if at least 2 cumulative ACKs from original window are preserved (excluding the ACK that advances to high_seq). With SACK-enhanced version, the detection is quite robust. FRTO should remove the need to set a high lower bound for the RTO estimator due to delay spikes that occur relatively common in some environments (esp. in wireless/cellular ones). [1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:12 -07:00
Ilpo Järvinen	009a2e3e4e	[TCP] FRTO: Improve interoperability with other undo_marker users Basically this change enables it, previously other undo_marker users were left with nothing. Reverse undo_marker logic completely to get it set right in CA_Loss. On the other hand, when spurious RTO is detected, clear it. Clearing might be too heavy for some scenarios but seems safe enough starting point for now and shouldn't have much effect except in majority of cases (if in any). By adding a new FLAG_ we avoid looping through write_queue when RTO occurs. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:11 -07:00
Ilpo Järvinen	7c46a03e67	[TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue Implements following cleanups: - Comment re-placement (CodingStyle) - tcp_tso_acked() local (wrapper-like) variable removal (readability) - __-types removed (IMHO they make local variables jumpy looking and just was space) - acked -> flag (naming conventions elsewhere in TCP code) - linebreak adjustments (readability) - nested if()s combined (reduced indentation) - clarifying newlines added Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:10 -07:00
Ilpo Järvinen	13fcf850cc	[TCP]: Move accounting from tso_acked to clean_rtx_queue The accounting code is pretty much the same, so it's a shame we do it in two places. I'm not too sure if added fully_acked check in MTU probing is really what we want perhaps the added end_seq could be used in the after() comparison. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:09 -07:00
Ilpo Järvinen	5af4ec236f	[TCP]: clear_all_retrans_hints prefixed by tcp_ In addition, fix its function comment spacing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>	2007-10-10 16:52:09 -07:00
Ilpo Järvinen	91fed7a15c	[TCP]: Make fackets_out accurate Substraction for fackets_out is unconditional when snd_una advances, thus there's no need to do it inside the loop. Just make sure correct bounds are honored. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:08 -07:00
Ilpo Järvinen	18f02545a9	[TCP] MIB: Add counters for discarded SACK blocks In DSACK case, some events are not extraordinary, such as packet duplication generated DSACK. They can arrive easily below snd_una when undo_marker is not set (TCP being in CA_Open), counting such DSACKs amoung SACK discards will likely just mislead if they occur in some scenario when there are other problems as well. Similarly, excessively delayed packets could cause "normal" DSACKs. Therefore, separate counters are allocated for DSACK events. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:30 -07:00
Ilpo Järvinen	5b3c98821a	[TCP]: Discard fuzzy SACK blocks SACK processing code has been a sort of russian roulette as no validation of SACK blocks is previously attempted. Besides, it is not very clear what all kinds of broken SACK blocks really mean (e.g., one that has start and end sequence numbers reversed). So now close the roulette once and for all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:29 -07:00
Ilpo Järvinen	6728e7dc3e	[TCP]: Rename tcp_ack_packets_out -> tcp_rearm_rto Only thing that tiny function does is rearming the RTO (if necessary), name it accordingly. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:28 -07:00
Ilpo Järvinen	e9144bd8da	[TCP]: Remove unnecessary wrapper tcp_packets_out_dec Makes caller side more obvious, there's no need to have a wrapper for this oneliner! Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:27 -07:00
Ilpo Järvinen	e60402d0a9	[TCP]: Move sack_ok access to obviously named funcs & cleanup Previously code had IsReno/IsFack defined as macros that were local to tcp_input.c though sack_ok field has user elsewhere too for the same purpose. This changes them to static inlines as preferred according the current coding style and unifies the access to sack_ok across multiple files. Magic bitops of sack_ok for FACK and DSACK are also abstracted to functions with appropriate names. Note: - One sack_ok = 1 remains but that's self explanary, i.e., it enables sack - Couple of !IsReno cases are changed to tcp_is_sack - There were no users for IsDSack => I dropped it Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:00 -07:00
Ilpo Järvinen	1b6d427bb7	[TCP]: Reduce sacked_out with reno when purging write_queue Previously TCP had a transitional state during which reno counted segments that are already below the current window into sacked_out, which is now prevented. In addition, re-try now the unconditional S+L skb catching. This approach conservatively calls just remove_sack and leaves reset_sack() calls alone. The best solution to the whole problem would be to first calculate the new sacked_out fully (this patch does not move reno_sack_reset calls from original sites and thus does not implement this). However, that would require very invasive change to fastretrans_alert (perhaps even slicing it to two halves). Alternatively, all callers of tcp_packets_in_flight (i.e., users that depend on sacked_out) should be postponed until the new sacked_out has been calculated but it isn't any simpler alternative. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:58 -07:00
Ilpo Järvinen	d02596e329	[TCP]: Keep state in Disorder also if only lost_out > 0 This happens rather infrequently and is only possible during FRTO. We must not allow TCP to slip to Open state because tcp_fastretrans_alert might then not be called on it's time when FRTO has exited. This become a problem when left_out got removed and was replaced by just sacked_out. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:58 -07:00
Ilpo Järvinen	86426c22d2	[TCP]: Restore over-zealous tcp_sync_left_out-like removals tcp_verify_left_out is useful for verifying S+L condition, so add it back to couple of places in where the code was not calling to tcp_sync_left_out but used own ad-hoc solution (before the tcp_sync_left_out got removed). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:57 -07:00
Ilpo Järvinen	005903bc3a	[TCP]: Left out sync->verify (the new meaning of it) & definify Left_out was dropped a while ago, thus leaving verifying consistency of the "left out" as only task for the function in question. Thus make it's name more appropriate. In addition, it is intentionally converted to #define instead of static inline because the location of the invariant failure is the most important thing to have if this ever triggers. I think it would have been helpful e.g. in this case where the location of the failure point had to be based on some quesswork: http://lkml.org/lkml/2007/5/2/464 ...Luckily the guesswork seems to have proved to be correct. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:57 -07:00
Ilpo Järvinen	83ae40885f	[TCP]: Add tcp_left_out(tp) "back" to get cleaner looking lines tp->left_out got removed but nothing came to replace it back then (users just did addition by themselves), so add function for users now. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:56 -07:00
Ilpo Järvinen	b5860bbac7	[TCP]: Tighten tcp_sock's belt, drop left_out It is easily calculable when needed and user are not that many after all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:55 -07:00
Ilpo Järvinen	bdf1ee5d3b	[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it No other users exist for tcp_ecn.h. Very few things remain in tcp.h, for most TCP ECN functions callers reside within a single .c file and can be placed there. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:54 -07:00
Ilpo Järvinen	9bff40fda0	[TCP] FRTO: remove unnecessary fackets/sacked_out recounting F-RTO does not touch SACKED_ACKED bits at all, so there is no need to recount them in tcp_enter_frto_loss. After removal of the else branch, nested ifs can be combined. This must also reset sacked_out when SACK is not in use as TCP could have received some duplicate ACKs prior RTO. To achieve that in a sane manner, tcp_reset_reno_sack was re-placed by the previous patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:53 -07:00
Ilpo Järvinen	4ddf66769d	[TCP]: Move Reno SACKed_out counter functions earlier Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:52 -07:00
David S. Miller	d06e021d71	[TCP]: Extract DSACK detection code from tcp_sacktag_write_queue(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:51 -07:00
Ilpo Järvinen	19b2b48658	[TCP]: Rexmit hint must be cleared instead of setting it Stupid error from my side. Even though now that I noticed this, I hoped it would have been an optimization but no, the counter hint is then incorrect. Thus clearing is necessary for now (I still suspect though that this path is never executed). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:51 -07:00
Ilpo Järvinen	d8f4f2235a	[TCP]: Extracted rexmit hint clearing from the LOST marking code Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:50 -07:00
Ilpo Järvinen	d738cd8fca	[TCP]: Add highest_sack seqno, points to globally highest SACK It is guaranteed to be valid only when !tp->sacked_out. In most cases this seqno is available in the last ACK but there is no guarantee for that. The new fast recovery loss marking algorithm needs this as entry point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:50 -07:00
Ilpo Järvinen	48611c47d0	[TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed When only GSO skb was partially ACKed, no hints are reset, therefore fastpath_cnt_hint must be tweaked too or else it can corrupt fackets_out. The corruption to occur, one must have non-trivial ACK/SACK sequence, so this bug is not very often that harmful. There's a fackets_out state reset in TCP because fackets_out is known to be inaccurate and that fixes the issue eventually anyway. In case there was also at least one skb that got fully ACKed, the fastpath_skb_hint is set to NULL which causes a recount for fastpath_cnt_hint (the old value won't be accessed anymore), thus it can safely be decremented without additional checking. Reported by Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-07 23:43:10 -07:00
David S. Miller	5c127c58ae	[TCP]: 'dst' can be NULL in tcp_rto_min() Reported by Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-31 14:39:44 -07:00
David S. Miller	05bb1fad1c	[TCP]: Allow minimum RTO to be configurable via routing metrics. Cell phone networks do link layer retransmissions and other things that cause unnecessary timeout retransmits. So allow the minimum RTO to be inflated per-route to deal with this. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-30 22:10:28 -07:00
David S. Miller	26722873a4	[TCP]: Describe tcp_init_cwnd() thoroughly in a comment. People often get tripped up by this function and think that it does not implemented the prescribed algorithms from RFC2414 and RFC3390, even though it does. So add a comment to head off such misunderstandings in the future. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-26 18:35:36 -07:00
Ilpo Järvinen	49ff4bb4cd	[TCP]: DSACK signals data receival, be conservative In case a DSACK is received, it's better to lower cwnd as it's a sign of data receival. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:47:59 -07:00
Ilpo Järvinen	2e6052941a	[TCP]: Also handle snd_una changes in tcp_cwnd_down tcp_cwnd_down must check for it too as it should be conservative in case of collapse stuff and also when receiver is trying to lie (though that wouldn't be very successful/useful anyway). Note: - Separated also is_dupack and do_lost in fast_retransalert * Much cleaner look-and-feel now * This time it really fixes cumulative ACK with many new SACK blocks recovery entry (I claimed this fixes with last patch but it wasn't). TCP will now call tcp_update_scoreboard regardless of is_dupack when in recovery as long as there is enough fackets_out. - Introduce FLAG_SND_UNA_ADVANCED * Some prior_snd_una arguments are unnecessary after it - Added helper FLAG_ANY_PROGRESS to avoid long FLAG...\|FLAG... constructs Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:46:58 -07:00
Ilpo Järvinen	b8ed601cef	[TCP]: Bidir flow must not disregard SACK blocks for lost marking It's possible that new SACK blocks that should trigger new LOST markings arrive with new data (which previously made is_dupack false). In addition, I think this fixes a case where we get a cumulative ACK with enough SACK blocks to trigger the fast recovery (is_dupack would be false there too). I'm not completely pleased with this solution because readability of the code is somewhat questionable as 'is_dupack' in SACK case is no longer about dupacks only but would mean something like 'lost_marker_work_todo' too... But because of Eifel stuff done in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to the if statement which seems attractive solution. Nevertheless, I didn't like adding another variable just for that either... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:31 -07:00
Ilpo Järvinen	1e757f9996	[TCP]: Fix ratehalving with bidirectional flows Actually, the ratehalving seems to work too well, as cwnd is reduced on every second ACK even though the packets in flight remains unchanged. Recoveries in a bidirectional flows suffer quite badly because of this, both NewReno and SACK are affected. After this patch, rate halving is performed for ACK only if packets in flight was supposedly changed too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:30 -07:00
Stephen Hemminger	30cfd0baf0	[TCP]: congestion control API pass RTT in microseconds This patch changes the API for the callback that is done after an ACK is received. It solves a couple of issues: * Some congestion controls want higher resolution value of RTT (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but all compute a RTT in microseconds. * Other congestion control could use RTT at jiffies resolution. To keep API consistent the units should be the same for both cases, just the resolution should change. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:27:57 -07:00
Stephen Hemminger	16751347a0	[TCP]: remove unused argument to cong_avoid op None of the existing TCP congestion controls use the rtt value pased in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt could have been -1 when handling a duplicate ack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:46:58 -07:00
Ilpo Järvinen	0a9f2a467d	[TCP]: Verify the presence of RETRANS bit when leaving FRTO For yet unknown reason, something cleared SACKED_RETRANS bit underneath FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-15 00:19:29 -07:00
Ilpo Järvinen	7769f4064c	[TCP]: Fix logic breakage due to DSACK separation Commit `6f74651ae6` is found guilty of breaking DSACK counting, which should be done only for the SACK block reported by the DSACK instead of every SACK block that is received along with DSACK information. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-15 15:14:04 -07:00
Ilpo Järvinen	b9ce204f0a	[TCP]: Congestion control API RTT sampling fix Commit `164891aadf` broke RTT sampling of congestion control modules. Inaccurate timestamps could be fed to them without providing any way for them to identify such cases. Previously RTT sampler was called only if FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate timestamps nicely. In addition, the new behavior could give an invalid timestamp (zero) to RTT sampler if only skbs with TCPCB_RETRANS were ACKed. This solves both problems. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-15 15:08:43 -07:00
Ilpo Järvinen	d7ea5b91fa	[TCP]: Add missing break to TCP option parsing code This flaw does not affect any behavior (currently). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-14 12:58:26 -07:00
Ilpo Järvinen	af15cc7b85	[TCP]: Fix left_out setting during FRTO Without FRTO, the tcp_try_to_open is never called with lost_out > 0 (see tcp_time_to_recover). However, when FRTO is enabled, the !tp->lost condition is not used until end of FRTO because that way TCP avoids premature entry to fast recovery during FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-12 16:16:44 -07:00

1 2 3 4 5 ...

256 Commits