Commit Graph

494752 Commits

Author SHA1 Message Date
Alexander Duyck
21d1f11db0 fib_trie: Remove checks for index >= tnode_child_length from tnode_get_child
For some reason the compiler doesn't seem to understand that when we are in
a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
of tn->bits to change.  As such every call to tnode_get_child was rerunning
tnode_chile_length which ended up consuming quite a bit of space in the
resultant assembly code.

I have gone though and verified that in all cases where tnode_get_child
is used we are either winding though a fixed loop from tnode_child_length -
1 to 0, or are in a fastpath case where we are verifying the value by
either checking for any remaining bits after shifting index by bits and
testing for leaf, or by using tnode_child_length.

size net/ipv4/fib_trie.o
Before:
   text	   data	    bss	    dec	    hex	filename
  15506	    376	      8	  15890	   3e12	net/ipv4/fib_trie.o

After:
   text	   data	    bss	    dec	    hex	filename
  14827	    376	      8	  15211	   3b6b	net/ipv4/fib_trie.o

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck
12c081a5c8 fib_trie: inflate/halve nodes in a more RCU friendly way
This change pulls the node_set_parent functionality out of put_child_reorg
and instead leaves that to the function to take care of as well.  By doing
this we can fully construct the new cluster of tnodes and all of the
pointers out of it before we start routing pointers into it.

I am suspecting this will likely fix some concurency issues though I don't
have a good test to show as such.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck
fc86a93b46 fib_trie: Push tnode flushing down to inflate/halve
This change pushes the tnode freeing down into the inflate and halve
functions.  It makes more sense here as we have a better grasp of what is
going on and when a given cluster of nodes is ready to be freed.

I believe this may address a bug in the freeing logic as well.  For some
reason if the freelist got to a certain size we would call
synchronize_rcu().  I'm assuming that what they meant to do is call
synchronize_rcu() after they had handed off that much memory via
call_rcu().  As such that is what I have updated the behavior to be.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck
ff181ed876 fib_trie: Push assignment of child to parent down into inflate/halve
This change makes it so that the assignment of the tnode to the parent is
handled directly within whatever function is currently handling the node be
it inflate, halve, or resize.  By doing this we can avoid some of the need
to set NULL pointers in the tree while we are resizing the subnodes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck
f05a48198b fib_trie: Add functions should_inflate and should_halve
This change pulls the logic for if we should inflate/halve the nodes out
into separate functions.  It also addresses what I believe is a bug where 1
full node is all that is needed to keep a node from ever being halved.

Simple script to reproduce the issue:
	modprobe dummy;	ifconfig dummy0 up
	for i in `seq 0 255`; do ifconfig dummy0:$i 10.0.${i}.1/24 up; done
	ifconfig dummy0:256 10.0.255.33/16 up
	for i in `seq 0 254`; do ifconfig dummy0:$i down; done

Results from /proc/net/fib_triestat
Before:
	Local:
		Aver depth:     3.00
		Max depth:      4
		Leaves:         17
		Prefixes:       18
		Internal nodes: 11
		  1: 8  2: 2  10: 1
		Pointers: 1048
	Null ptrs: 1021
	Total size: 11  kB
After:
	Local:
		Aver depth:     3.41
		Max depth:      5
		Leaves:         17
		Prefixes:       18
		Internal nodes: 12
		  1: 8  2: 3  3: 1
		Pointers: 36
	Null ptrs: 8
	Total size: 3  kB

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
cf3637bb8f fib_trie: Move resize to after inflate/halve
This change consists of a cut/paste of resize to behind inflate and halve
so that I could remove the two function prototypes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
345e9b5426 fib_trie: Push rcu_read_lock/unlock to callers
This change is to start cleaning up some of the rcu_read_lock/unlock
handling.  I realized while reviewing the code there are several spots that
I don't believe are being handled correctly or are masking warnings by
locally calling rcu_read_lock/unlock instead of calling them at the correct
level.

A common example is a call to fib_get_table followed by fib_table_lookup.
The rcu_read_lock/unlock ought to wrap both but there are several spots where
they were not wrapped.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
98293e8d2f fib_trie: Use unsigned long for anything dealing with a shift by bits
This change makes it so that anything that can be shifted by, or compared
to a value shifted by bits is updated to be an unsigned long.  This is
mostly a precaution against an insanely huge address space that somehow
starts coming close to the 2^32 root node size which would require
something like 1.5 billion addresses.

I chose unsigned long instead of unsigned long long since I do not believe
it is possible to allocate a 32 bit tnode on a 32 bit system as the memory
consumed would be 16GB + 28B which exceeds the addressible space for any
one process.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
e9b44019d4 fib_trie: Update meaning of pos to represent unchecked bits
This change moves the pos value to the other side of the "bits" field.  By
doing this it actually simplifies a significant amount of code in the trie.

For example when halving a tree we know that the bit lost exists at
oldnode->pos, and if we inflate the tree the new bit being add is at
tn->pos.  Previously to find those bits you would have to subtract pos and
bits from the keylength or start with a value of (1 << 31) and then shift
that.

There are a number of spots throughout the code that benefit from this.  In
the case of the hot-path searches the main advantage is that we can drop 2
or more operations from the search path as we no longer need to compute the
value for the index to be shifted by and can instead just use the raw pos
value.

In addition the tkey_extract_bits is now defunct and can be replaced by
get_index since the two operations were doing the same thing, but now
get_index does it much more quickly as it is only an xor and shift versus a
pair of shifts and a subtraction.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
836a0123c9 fib_trie: Optimize fib_table_insert
This patch updates the fib_table_insert function to take advantage of the
changes made to improve the performance of fib_table_lookup.  As a result
the code should be smaller and run faster then the original.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
939afb0657 fib_trie: Optimize fib_find_node
This patch makes use of the same features I made use of for
fib_table_lookup to streamline fib_find_node.  The resultant code should be
smaller and run faster than the original.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
9f9e636d4f fib_trie: Optimize fib_table_lookup to avoid wasting time on loops/variables
This patch is meant to reduce the complexity of fib_table_lookup by reducing
the number of variables to the bare minimum while still keeping the same if
not improved functionality versus the original.

Most of this change was started off by the desire to rid the function of
chopped_off and current_prefix_length as they actually added very little to
the function since they only applied when computing the cindex.  I was able
to replace them mostly with just a check for the prefix match.  As long as
the prefix between the key and the node being tested was the same we know
we can search the tnode fully versus just testing cindex 0.

The second portion of the change ended up being a massive reordering.
Originally the calls to check_leaf were up near the start of the loop, and
the backtracing and descending into lower levels of tnodes was later.  This
didn't make much sense as the structure of the tree means the leaves are
always the last thing to be tested.  As such I reordered things so that we
instead have a loop that will delve into the tree and only exit when we
have either found a leaf or we have exhausted the tree.  The advantage of
rearranging things like this is that we can fully inline check_leaf since
there is now only one reference to it in the function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
adaf981685 fib_trie: Merge leaf into tnode
This change makes it so that leaf and tnode are the same struct.  As a
result there is no need for rt_trie_node anymore since everyting can be
merged into tnode.

On 32b systems this results in the leaf being 4 bytes larger, however I
don't know if that is really an issue as this and an eariler patch that
added bits & pos have increased the size from 20 to 28.  If I am not
mistaken slub/slab allocate on power of 2 sizes so 20 was likely being
rounded up to 32 anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
37fd30f2da fib_trie: Merge tnode_free and leaf_free into node_free
Both the leaf and the tnode had an rcu_head in them, but they had them in
slightly different places.  Since we now have them in the same spot and
know that any node with bits == 0 is a leaf and the rest are either vmalloc
or kmalloc tnodes depending on the value of bits it makes it easy to combine
the functions and reduce overhead.

In addition I have taken advantage of the rcu_head pointer to go ahead and
put together a simple linked list instead of using the tnode pointer as
this way we can merge either type of structure for freeing.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
64c9b6fb26 fib_trie: Make leaf and tnode more uniform
This change makes some fundamental changes to the way leaves and tnodes are
constructed.  The big differences are:
1.  Leaves now populate pos and bits indicating their full key size.
2.  Trie nodes now mask out their lower bits to be consistent with the leaf
3.  Both structures have been reordered so that rt_trie_node now consisists
    of a much larger region including the pos, bits, and rcu portions of
    the tnode structure.

On 32b systems this will result in the leaf being 4B larger as the pos and
bits values were added to a hole created by the key as it was only 4B in
length.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck
8274a97aa4 fib_trie: Update usage stats to be percpu instead of global variables
The trie usage stats were currently being shared by all threads that were
calling fib_table_lookup.  As a result when multiple threads were
performing lookups simultaneously the trie would begin to cache bounce
between those threads.

In order to prevent this I have updated the usage stats to use a set of
percpu variables.  By doing this we should be able to avoid the cache
bouncing and still make use of these stats.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:53 -05:00
stephen hemminger
bec94d430f gre: allow live address change
The GRE tap device supports Ethernet over GRE, but doesn't
care about the source address of the tunnel, therefore it
can be changed without bring device down.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 14:18:28 -05:00
Bill Hong
33f72e6f0c l2tp : multicast notification to the registered listeners
Previously l2tp module did not provide any means for the user space to
get notified when tunnels/sessions are added/modified/deleted.
This change contains the following
- create a multicast group for the listeners to register.
- notify the registered listeners when the tunnels/sessions are
  created/modified/deleted.

Signed-off-by: Bill Hong <bhong@brocade.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Sven-Thorsten Dietrich <sven@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 14:17:20 -05:00
Alex Gartrell
957f094f22 tun: return proper error code from tun_do_read
Instead of -1 with EAGAIN, read on a O_NONBLOCK tun fd will return 0.  This
fixes this by properly returning the error code from __skb_recv_datagram.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 14:14:54 -05:00
Alex Gartrell
87897931c8 tun: Fixed unsigned/signed comparison
Validated that this was actually using the unsigned comparison with gdb.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 14:14:54 -05:00
Fabian Frederick
886eaa1fe6 tipc: replace 0 by NULL for pointers
Fix sparse warning:
net/tipc/link.c:1924:40: warning: Using plain integer as NULL pointer

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:11:39 -05:00
David S. Miller
f230332fa4 Merge branch 'enic-next'
Govindarajulu Varadarajan says:

====================
enic: Check for DMA mapping error

After dma mapping the buffers, enic does not call dma_mapping_error() to check
if mapping is successful.

This series fixes the issue by checking return value of pci_dma_mapping_error()
after pci_map_single().

This is reported by redhat here
https://bugzilla.redhat.com/show_bug.cgi?id=1145016
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:08:54 -05:00
Govindarajulu Varadarajan
58feff07e9 enic: add stats for dma mapping error
This patch adds generic statistics for enic. As of now dma_map_error is the only
member. dma_map_erro is incremented every time dma maping error happens.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:08:45 -05:00
Govindarajulu Varadarajan
065df159ec enic: check dma_mapping_error
This patch checks for pci_dma_mapping_error() after dma mapping the data.
If the dma mapping fails we remove the previously queued frags and return
NETDEV_TX_OK.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:08:45 -05:00
Govindarajulu Varadarajan
5e32066d00 enic: make vnic_wq_buf doubly linked
This patch makes vnic_wq_buf doubly liked list. This is needed for dma_mapping
error check, in case some frag's dma map fails, we need to move back and remove
previously queued buffers.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:08:45 -05:00
David S. Miller
5164172f33 Merge branch 'fec-next'
Fugang Duan says:

====================
net: fec: add Wake-on-LAN support

The patch series enable FEC Wake-on-LAN feature for i.MX6q/dl and i.MX6SX SOCs.
FEC HW support sleep mode, when system in suspend status with FEC all clock gate
off, magic packet can wake up system. For different SOCs, there have special SOC
GPR register to let FEC enter sleep mode or exit sleep mode, add these to platform
callback for driver' call.

Patch#1: add WOL interface supports.
Patch#2: add SOC special sleep of/off operations for driver's sleep callback.
Patch#3: add magic pattern support for devicetree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:06:57 -05:00
Nimrod Andy
07b4d2dda0 ARM: dts: imx6qdl: enable FEC magic-packet feature
Add FEC magic-packet feature support.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:06:51 -05:00
Nimrod Andy
456062b3ec ARM: imx: add FEC sleep mode callback function
i.MX6q/dl, i.MX6SX SOCs enet support sleep mode that magic packet can
wake up system in suspend status. For different SOCs, there have some
SOC specifical GPR register to set sleep on/off mode. So add these to
callback function for driver.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:06:51 -05:00
Nimrod Andy
de40ed31b3 net: fec: add Wake-on-LAN support
Support for Wake-on-LAN using Magic Packet. ENET IP supports sleep mode
in low power status, when system enter suspend status, Magic packet can
wake up system even if all SOC clocks are gate. The patch doing below things:
- flagging the device as a wakeup source for the system, as well as
  its Wake-on-LAN interrupt
- prepare the hardware for entering WoL mode
- add standard ethtool WOL interface
- enable the ENET interrupt to wake us

Tested on i.MX6q/dl sabresd, sabreauto boards, i.MX6SX arm2 boards.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 13:06:50 -05:00
Kevin Hao
03366a33db net: gianfar: add missing __iomem annotation
Fix the following spare warning:
drivers/net/ethernet/freescale/gianfar.c:3521:60: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/freescale/gianfar.c:3521:60:    expected unsigned int [noderef] <asn:2>*addr
drivers/net/ethernet/freescale/gianfar.c:3521:60:    got unsigned int [usertype] *rfbptr
drivers/net/ethernet/freescale/gianfar.c:205:16: warning: incorrect type in assignment (different address spaces)
drivers/net/ethernet/freescale/gianfar.c:205:16:    expected unsigned int [usertype] *rfbptr
drivers/net/ethernet/freescale/gianfar.c:205:16:    got unsigned int [noderef] <asn:2>*<noident>
drivers/net/ethernet/freescale/gianfar.c:2918:44: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/freescale/gianfar.c:2918:44:    expected unsigned int [noderef] <asn:2>*addr
drivers/net/ethernet/freescale/gianfar.c:2918:44:    got unsigned int [usertype] *rfbptr

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:54:15 -05:00
Kevin Hao
91c53f7635 net: gianfar: mark the local functions static
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:54:15 -05:00
Jason Wang
41f2f1273c virtio-net: don't do header check for dodgy gso packets
There's no need to do header check for virtio-net since:

- Host sets dodgy for all gso packets from guest and check the header.
- Host should be prepared for all kinds of evil packets from guest, since
  malicious guest can send any kinds of packet.

So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:53:20 -05:00
Dmitry Eremin-Solenikov
dd45077799 arm: sa1100: move irda header to linux/platform_data
In the end asm/mach/irda.h header is not used by anybody except sa1100.
Move the header to the platform data includes dir and rename it to
irda-sa11x0.h.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:44:07 -05:00
Julia Lawall
12141337af net: sxgbe: Use setup_timer
Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:34:23 -05:00
Julia Lawall
fea3cb063b ksz884x: Use setup_timer
Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:34:23 -05:00
Julia Lawall
cc73e41f17 atl1e: Use setup_timer
Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:34:23 -05:00
Julia Lawall
d0e520ef63 atheros: atlx: Use setup_timer
Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:34:23 -05:00
David S. Miller
5115ec9654 Merge branch 'timecounter'
Richard Cochran says:

====================
Time Counter fixes and improvements

Several PTP Hardware Clock (PHC) drivers implement the clock in
software using the timecounter/cyclecounter code. This series adds one
simple improvement and one more subtle fix to the shared timecounter
facility. Credit for this series goes to Janusz Użycki, who pointed
the issues out to me off list.

Patch #1 simply move the timecounter code into its own file. When
working on this series, it was really annoying to see half the kernel
recompile after every tweak to the timecounter stuff. There is no
reason to keep this together with the clocksource code.

Patch #2 implements an improved adjtime() method, and patches 3-10
convert all of the drivers over to the new method.

Patch #11 fixes a subtle but important issue with the timecounter WRT
frequency adjustment. As it stands now, a timecounter based PHC will
exhibit a variable frequency resolution (and variable time error)
depending on how often the clock is read.

In timecounter_read_delta(), the expression

   (delta * cc->mult) >> cc->shift;

can lose resolution from the adjusted value of 'mult'. If the value
of 'delta' is too small, then small changes in 'mult' have no effect.
However, if the delta value is large enough, then small changes in
'mult' will have an effect.

Reading the clock too often means smaller 'delta' values which in turn
will spoil the fine adjustments made to 'mult'. Up until now, this
effect did not show up in my testing. The following example explains
why.

The CPTS has an input clock of 250 MHz, and the clock source uses
mult=0x80000000 and shift=29, making the ticks to nanoseconds
conversion like this:

   ticks * 2^31
   ------------
       2^29

Imagine what happens if the clock is read every 10 milliseconds. Ten
milliseconds are about 2500000 ticks, which corresponds to about 21
bits. The product in the numerator has then 52 bits. After the shift
operation, 23 bits are preserved. This results in a frequency
adjustment resolution of about 0.1 ppm (not _too_ bad.)

A frequency resolution of 1 ppm requires 20 bits.
A frequency resolution of 1 ppb requires 30 bits.

For the 250 MHz CPTS clock, reading every 4 seconds yields a 1 ppb
resolution (which is the finest that our API allows).

However, the error can be much higher if the clock is read too often
or if time stamps occur close in time to read operations. In general
it is really not acceptable to allow the rate of clock readings to
influence the clock accuracy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:30:00 -05:00
Richard Cochran
2eebdde652 timecounter: keep track of accumulated fractional nanoseconds
The current timecounter implementation will drop a variable amount
of resolution, depending on the magnitude of the time delta. In
other words, reading the clock too often or too close to a time
stamp conversion will introduce errors into the time values. This
patch fixes the issue by introducing a fractional nanosecond field
that accumulates the low order bits.

Reported-by: Janusz Użycki <j.uzycki@elproma.com.pl>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:27 -05:00
Richard Cochran
f25a30be35 net: cpts: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:27 -05:00
Richard Cochran
ce51ff0937 net: mlx4: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:27 -05:00
Richard Cochran
826ef90dc4 net: ixgbe: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:27 -05:00
Richard Cochran
5ee698e367 net: igb: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:26 -05:00
Richard Cochran
f4de2b9568 net: e1000e: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:26 -05:00
Richard Cochran
59e16961c6 net: fec: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:26 -05:00
Richard Cochran
2e5601f9ac net: bnx2x: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:26 -05:00
Richard Cochran
8b976e9777 net: xgbe: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:26 -05:00
Richard Cochran
796c1efd6f timecounter: provide a helper function to shift the time.
Some PTP Hardware Clock drivers use a struct timecounter to represent
their clock. To adjust the time by a given offset, these drivers all
perform a two step read/write of their timecounter. However, it is
better and simpler just to adjust the offset in one step. This patch
introduces a little routine to help drivers implement the adjtime
method.

Suggested-by: Janusz Użycki <j.uzycki@elproma.com.pl>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:25 -05:00
Richard Cochran
74d23cc704 time: move the timecounter/cyclecounter code into its own file.
The timecounter code has almost nothing to do with the clocksource
code. Let it live in its own file. This will help isolate the
timecounter users from the clocksource users in the source tree.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:25 -05:00
Linus Torvalds
2c90331cf5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.

 2) Fix receive checksum handling in enic driver, from Govindarajulu
    Varadarajan.

 3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
    Herbert Xu.  Also, add code to detect drivers that have this mistake
    in the future.

 4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.

 5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
    input path,f rom Nicolas Dichtel.

 6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.

 7) Fix double SKB free in vxlan driver, also from Pravin.

 8) When we scrub a packet, which happens when we are switching the
    context of the packet (namespace, etc.), we should reset the
    secmark.  From Thomas Graf.

 9) ->ndo_gso_check() needs to do more than return true/false, it also
    has to allow the driver to clear netdev feature bits in order for
    the caller to be able to proceed properly.  From Jesse Gross.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
  genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
  netlink/genetlink: pass network namespace to bind/unbind
  ne2k-pci: Add pci_disable_device in error handling
  bonding: change error message to debug message in __bond_release_one()
  genetlink: pass multicast bind/unbind to families
  netlink: call unbind when releasing socket
  netlink: update listeners directly when removing socket
  genetlink: pass only network namespace to genl_has_listeners()
  netlink: rename netlink_unbind() to netlink_undo_bind()
  net: Generalize ndo_gso_check to ndo_features_check
  net: incorrect use of init_completion fixup
  neigh: remove next ptr from struct neigh_table
  net: xilinx: Remove unnecessary temac_property in the driver
  net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
  net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
  openvswitch: fix odd_ptr_err.cocci warnings
  Bluetooth: Fix accepting connections when not using mgmt
  Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
  brcmfmac: Do not crash if platform data is not populated
  ipw2200: select CFG80211_WEXT
  ...
2014-12-30 10:45:47 -08:00