linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-10 04:24:17 +08:00

Author	SHA1	Message	Date
Jingoo Han	ba935f4097	EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-12-06 10:23:41 +01:00
Linus Torvalds	cdd278db0e	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC driver updates from Mauro Carvalho Chehab: - sb_edac: add support for Ivy Bridge support - cell_edac: add a missing of_node_put() call * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: cell_edac: fix missing of_node_put sb_edac: add support for Ivy Bridge sb_edac: avoid decoding the same error multiple times sb_edac: rename mci_bind_devs() sb_edac: enable multiple PCI id tables to be used sb_edac: rework sad_pkg sb_edac: allow different interleave lists sb_edac: allow different dram_rule arrays sb_edac: isolate TOHM retrieval sb_edac: rename pci_br sb_edac: isolate TOLM retrieval sb_edac: make RANK_CFG_A value part of sbridge_info	2013-11-18 14:51:52 -08:00
Aristeu Rozanski	4d715a805b	sb_edac: add support for Ivy Bridge Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's wiser to modify sb_edac to support both instead of creating another driver. [m.chehab@samsung.com: Fix CodingStyle] Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 17:13:22 -02:00
Aristeu Rozanski	be3036d220	sb_edac: avoid decoding the same error multiple times Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:50:03 -02:00
Aristeu Rozanski	ea779b5a09	sb_edac: rename mci_bind_devs() This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:47 -02:00
Aristeu Rozanski	5153a0f94c	sb_edac: enable multiple PCI id tables to be used This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:27 -02:00
Aristeu Rozanski	cc311991a7	sb_edac: rework sad_pkg This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:04 -02:00
Aristeu Rozanski	ef1ce51e7b	sb_edac: allow different interleave lists This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:48:42 -02:00
Aristeu Rozanski	464f1d829a	sb_edac: allow different dram_rule arrays This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:48:16 -02:00
Aristeu Rozanski	8fd6a43ac9	sb_edac: isolate TOHM retrieval This is preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:43:17 -02:00
Aristeu Rozanski	5f8a1b8a7b	sb_edac: rename pci_br Ivy Bridge has more than one, so rename pci_br to pci_br0 Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:43:06 -02:00
Aristeu Rozanski	fb79a50926	sb_edac: isolate TOLM retrieval This is in preparation for the Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:25:27 -02:00
Aristeu Rozanski	ef1e8d03b1	sb_edac: make RANK_CFG_A value part of sbridge_info This is in preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:20:37 -02:00
Chen, Gong	10ef6b0dff	bitops: Introduce a more generic BITMASK macro GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-10-21 15:12:01 -07:00
Luck, Tony	de4772c621	edac: sb_edac.c should not require prescence of IMC_DDRIO device The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR space to determine the type of DIMMs (registered or unregistered). But this device does not exist on some single socket Sandy Bridge servers. While the type of DIMMs is nice to know, it is not essential for this driver's other functions. So it seems harsh to have it refuse to load at all when it cannot find this device. Make the check for this device be optional. If it isn't present just report the memory type as "MEM_UNKNOWN". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-04-29 10:32:40 -03:00
Mauro Carvalho Chehab	1339730e73	Linux 3.8-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+ Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M= =jjBc -----END PGP SIGNATURE----- Merge tag 'v3.8-rc7' into next Linux 3.8-rc7 * tag 'v3.8-rc7': (12052 commits) Linux 3.8-rc7 net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned ARM: DMA mapping: fix bad atomic test ARM: realview: ensure that we have sufficient IRQs available ARM: GIC: fix GIC cpumask initialization net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try virtio_console: Don't access uninitialized data. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() ...	2013-02-20 15:45:52 -03:00
Greg Kroah-Hartman	9b3c6e85c2	Drivers: edac: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:03 -08:00
Mauro Carvalho Chehab	da14d93d95	sb_edac: add a missing /n on a debug message [ 17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)<7>[ 17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-12-21 08:01:18 -02:00
Mauro Carvalho Chehab	deb09ddaff	sb_edac: Avoid overflow errors at memory size calculation Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Cc: stable@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-09-25 07:38:20 -03:00
Mauro Carvalho Chehab	c2078e4c91	Merge branch 'devel' * devel: (33 commits) edac i5000, i5400: fix pointer math in i5000_get_mc_regs() edac: allow specifying the error count with fake_inject edac: add support for Calxeda highbank L2 cache ecc edac: add support for Calxeda highbank memory controller edac: create top-level debugfs directory sb_edac: properly handle error count i7core_edac: properly handle error count edac: edac_mc_handle_error(): add an error_count parameter edac: remove arch-specific parameter for the error handler amd64_edac: Don't pass driver name as an error parameter edac_mc: check for allocation failure in edac_mc_alloc() edac: Increase version to 3.0.0 edac_mc: Cleanup per-dimm_info debug messages edac: Convert debugfX to edac_dbg(X, edac: Use more normal debugging macro style edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Edac: Add ABI Documentation for the new device nodes edac: move documentation ABI to ABI/testing/sysfs-devices-edac i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy edac: change the mem allocation scheme to make Documentation/kobject.txt happy ...	2012-07-29 21:11:05 -03:00
Mauro Carvalho Chehab	c10538396b	sb_edac: properly handle error count Instead of reporting the error count via driver-specific details, use the new way provided by edac_mc_handle_error. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:49 -03:00
Mauro Carvalho Chehab	9eb07a7fb8	edac: edac_mc_handle_error(): add an error_count parameter In order to avoid loosing error events, it is desirable to group error events together and generate a single trace for several identical errors. The trace API already allows reporting multiple errors. Change the handle_error function to also allow that. The changes at the drivers were made by this small script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:47 -03:00
Mauro Carvalho Chehab	03f7eae80f	edac: remove arch-specific parameter for the error handler Remove the arch-dependent parameter, as it were not used, as the MCE tracepoint weren't implemented. It probably doesn't make sense to have an MCE-specific tracepoint, as this will cost more bytes at the tracepoint, and tracepoint is not free. The changes at the EDAC drivers were done by this small perl script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s$([^\;]+)\,([^\,$]+)\s\)/$1($2)/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:52 -03:00
Joe Perches	956b9ba156	edac: Convert debugfX to edac_dbg(X, Use a more common debugging style. Remove __FILE__ uses, add missing newlines, coalesce formats and align arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:49 -03:00
Mauro Carvalho Chehab	dd23cd6eb1	edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs The debug macro already adds that. Most of the work here was made by this small script: $f .=$_ while (<>); $f =~ s/(debugf[0-9]\s$\s)__FILE__\s": /\1"/g; $f =~ s/(debugf[0-9]\s\(\s)__FILE__\s/\1/g; $f =~ s/(debugf[0-9]\s\(\s)__FILE__\s"MC: /\1"/g; $f =~ s/(debugf[0-9]\s\(\")\%s[\:\,\($]\s([^\"]\s[^\)]+)__func__\s\,\s/\1\2/g; $f =~ s/(debugf[0-9]\s$\")\%s[\:\,\($]\s([^\"]\s[^\)]+),\s__func__\s\)/\1\2)/g; $f =~ s/(debugf[0-9]\s$\"MC\:\s)\%s[\:\,\($]\s([^\"]\s[^\)]+)__func__\s\,\s/\1\2/g; $f =~ s/(debugf[0-9]\s$\"MC\:\s)\%s[\:\,\($]\s([^\"]\s[^\)]+),\s__func__\s*\)/\1\2)/g; $f =~ s/\"MC\: \\n\"/"MC:\\n"/g; print $f; After running the script, manual cleanups were done to fix it the remaining places. While here, removed the __LINE__ on most places, as it doesn't actually give useful info on most places. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:47 -03:00
Mauro Carvalho Chehab	fd687502dc	edac: Rename the parent dev to pdev As EDAC doesn't use struct device itself, it created a parent dev pointer called as "pdev". Now that we'll be converting it to use struct device, instead of struct devsys, this needs to be fixed. No functional changes. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:56:06 -03:00
Chen Gong	2cbb587d3b	edac: fix the error about memory type detection on SandyBridge On SandyBridge, DDRIOA(Dev: 17 Func: 0 Offset: 328) is used to detect whether DIMM is RDIMM/LRDIMM, not TA(Dev: 15 Func: 0). Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:49:51 -03:00
Chen Gong	e35fca4791	edac: avoid mce decoding crash after edac driver unloaded Some edac drivers register themselves as mce decoders via notifier_chain. But in current notifier_chain implementation logic, it doesn't accept same notifier registered twice. If so, it will be wrong when adding/removing the element from the list. For example, on one SandyBridge platform, remove module sb_edac and then trigger one error, it will hit oops because it has no mce decoder registered but related notifier_chain still points to an invalid callback function. Here is an example: Call Trace: [<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8102b936>] mce_log+0x46/0x180 [<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60 [<ffffffff812e19d2>] ghes_do_proc+0x192/0x210 [<ffffffff812e2066>] ghes_proc+0x46/0x70 [<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80 [<ffffffff8150ef05>] notifier_call_chain+0x55/0x80 [<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80 [<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23 [<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b [<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b [<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f [<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36 [<ffffffff81069dc2>] process_one_work+0x132/0x450 [<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0 [<ffffffff8106ba50>] ? manage_workers+0x120/0x120 [<ffffffff81070aee>] kthread+0x9e/0xb0 [<ffffffff81514724>] kernel_thread_helper+0x4/0x10 [<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81514720>] ? gs_change+0x13/0x13 Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41 0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b 79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41 RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80 RSP <ffff88042868fb20> CR2: ffffffffa01af838 ---[ end trace 0100930068e73e6f ]--- BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [<ffffffff810705b0>] kthread_data+0x10/0x20 PGD `1a0d067` PUD 1a0e067 PMD 0 Oops: 0000 [#2] SMP Only i7core_edac and sb_edac have such issues because they have more than one memory controller which means they have to register mce decoder many times. Cc: <stable@vger.kernel.org> # 3.2 and upper Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:49:51 -03:00
Linus Torvalds	87a5af24e5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC internal API changes from Mauro Carvalho Chehab: "This changeset is the first part of a series of patches that fixes the EDAC sybsystem. On this set, it changes the Kernel EDAC API in order to properly represent the Intel i3/i5/i7, Xeon 3xxx/5xxx/7xxx, and Intel E5-xxxx memory controllers. The EDAC core used to assume that: - the DRAM chip select pin is directly accessed by the memory controller - when multiple channels are used, they're all filled with the same type of memory. None of the above premises is true on Intel memory controllers since 2002, when RAMBUS and FB-DIMMs were introduced, and Advanced Memory Buffer or by some similar technologies hides the direct access to the DRAM pins. So, the existing drivers for those chipsets had to lie to the EDAC core, in general telling that just one channel is filled. That produces some hard to understand error messages like: EDAC MC0: CE row 3, channel 0, label "DIMM1": 1 Unknown error(s): memory read error on FATAL area : cpu=0 Err=0008:00c2 (ch=2), addr = 0xad1f73480 => socket=0, Channel=0(mask=2), rank=1 The location information there (row3 channel 0) is completely bogus: it has no physical meaning, and are just some random values that the driver uses to talk with the EDAC core. The error actually happened at CPU socket 0, channel 0, slot 1, but this is not reported anywhere, as the EDAC core doesn't know anything about the memory layout. So, only advanced users that know how the EDAC driver works and that tests their systems to see how DIMMs are mapped can actually benefit for such error logs. This patch series fixes the error report logic, in order to allow the EDAC to expose the memory architecture used by them to the EDAC core. So, as the EDAC core now understands how the memory is organized, it can provide an useful report: EDAC MC0: CE memory read error on DIMM1 (channel:0 slot:1 page:0x364b1b offset:0x600 grain:32 syndrome:0x0 - count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:4) The location of the DIMM where the error happened is reported by "MC0" (cpu socket #0), at "channel:0 slot:1" location, and matches the physical location of the DIMM. There are two remaining issues not covered by this patch series: - The EDAC sysfs API will still report bogus values. So, userspace tools like edac-utils will still use the bogus data; - Add a new tracepoint-based way to get the binary information about the errors. Those are on a second series of patches (also at -next), but will probably miss the train for 3.5, due to the slow review process." Fix up trivial conflict (due to spelling correction of removed code) in drivers/edac/edac_device.c * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (42 commits) i7core: fix ranks information at the per-channel struct i5000: Fix the fatal error handling i5100_edac: Fix a warning when compiled with 32 bits i82975x_edac: Test nr_pages earlier to save a few CPU cycles e752x_edac: provide more info about how DIMMS/ranks are mapped i5000_edac: Fix the logic that retrieves memory information i5400_edac: improve debug messages to better represent the filled memory edac: Cleanup the logs for i7core and sb edac drivers edac: Initialize the dimm label with the known information edac: Remove the legacy EDAC ABI x38_edac: convert driver to use the new edac ABI tile_edac: convert driver to use the new edac ABI sb_edac: convert driver to use the new edac ABI r82600_edac: convert driver to use the new edac ABI ppc4xx_edac: convert driver to use the new edac ABI pasemi_edac: convert driver to use the new edac ABI mv64x60_edac: convert driver to use the new edac ABI mpc85xx_edac: convert driver to use the new edac ABI i82975x_edac: convert driver to use the new edac ABI i82875p_edac: convert driver to use the new edac ABI ...	2012-05-29 18:32:37 -07:00
Mauro Carvalho Chehab	e17a2f42a4	edac: Cleanup the logs for i7core and sb edac drivers Remove some information that it is duplicated at the MCE log, and don't have much usage for the error. Those data will be added again, when creating a trace function that outputs both memory errors and MCE fields. Cc: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:51 -03:00
Mauro Carvalho Chehab	ca0907b9e4	edac: Remove the legacy EDAC ABI Now that all drivers got converted to use the new ABI, we can drop the old one. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab	c36e3e7768	sb_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:48 -03:00
Mauro Carvalho Chehab	a895bf8b1e	edac: move nr_pages to dimm struct The number of pages is a dimm property. Move it to the dimm struct. After this change, it is possible to add sysfs nodes for the DIMM's that will properly represent the DIMM stick properties, including its size. A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when the memory controller represents the memory via chip select rows. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	5e2af0c09e	edac: Don't initialize csrow's first_page & friends when not needed Almost all edac drivers initialize csrow_info->first_page, csrow_info->last_page and csrow_info->page_mask. Those vars are used inside the EDAC core, in order to calculate the csrow affected by an error, by using the routine edac_mc_find_csrow_by_page(). However, very few drivers actually use it: e752x_edac.c e7xxx_edac.c i3000_edac.c i82443bxgx_edac.c i82860_edac.c i82875p_edac.c i82975x_edac.c r82600_edac.c There also a few other drivers that have their own calculus formula internally using those vars. All the others are just wasting time by initializing those data. While initializing data without using them won't cause any troubles, as those information is stored at the wrong place (at csrows structure), it is better to remove what is unused, in order to simplify the next patch. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	084a4fccef	edac: move dimm properties to struct dimm_info On systems based on chip select rows, all channels need to use memories with the same properties, otherwise the memories on channels A and B won't be recognized. However, such assumption is not true for all types of memory controllers. Controllers for FB-DIMM's don't have such requirements. Also, modern Intel controllers seem to be capable of handling such differences. So, we need to get rid of storing the DIMM information into a per-csrow data, storing it, instead at the right place. The first step is to move grain, mtype, dtype and edac_mode to the per-dimm struct. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Mike Williams <mike@mikebwilliams.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	a7d7d2e1a0	edac: Create a dimm struct and move the labels into it The way a DIMM is currently represented implies that they're linked into a per-csrow struct. However, some drivers don't see csrows, as they're ridden behind some chip like the AMB's on FBDIMM's, for example. This forced drivers to fake^Wvirtualize a csrow struct, and to create a mess under csrow/channel original's concept. Move the DIMM labels into a per-DIMM struct, and add there the real location of the socket, in terms of csrow/channel. Latter patches will modify the location to properly represent the memory architecture. All other drivers will use a per-csrow type of location. Some of those drivers will require a latter conversion, as they also fake the csrows internally. TODO: While this patch doesn't change the existing behavior, on csrows-based memory controllers, a csrow/channel pair points to a memory rank. There's a known bug at the EDAC core that allows having different labels for the same DIMM, if it has more than one rank. A latter patch is need to merge the several ranks for a DIMM into the same dimm_info struct, in order to avoid having different labels for the same DIMM. The edac_mc_alloc() will now contain a per-dimm initialization loop that will be changed by latter patches in order to match other types of memory architectures. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:57 -03:00
David Mackey	15ed103a98	edac: Fix spelling errors. Signed-off-by: David Mackey <tdmackey@twitter.com> Signed-off-by: Vinson Lee <vlee@twitter.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-04-30 13:28:41 +02:00
Linus Torvalds	f0f3680e50	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes from Mauro Carvalho Chehab: "A series of EDAC driver fixes. It also has one core fix at the documentation, and a rename patch, fixing the name of the struct that contains the rank information." * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: edac: rename channel_info to rank_info i5400_edac: Avoid calling pci_put_device() twice edac: i5100 ack error detection register after each read edac: i5100 fix erroneous define for M1Err edac: sb_edac: Fix a wrong value setting for the previous value edac: sb_edac: Fix a INTERLEAVE_MODE() misuse edac: sb_edac: Let the driver depend on PCI_MMCONFIG edac: Improve the comments to better describe the memory concepts edac/ppc4xx_edac: Fix compilation Fix sb_edac compilation with 32 bits kernels	2012-03-28 14:24:40 -07:00
Hui Wang	7fae0db439	edac: sb_edac: Fix a wrong value setting for the previous value >From the driver design, the variable limit wants to compare with its previous value, we should set the value of limit instead of the value of tmp_mb to the variable prev. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:11 -03:00
Hui Wang	ad9c40b7dd	edac: sb_edac: Fix a INTERLEAVE_MODE() misuse We can identify dram interleave mode from the Dram Rule register rather than Dram Interleave list register. In this context, the reg of INTERLEAVE_MODE(reg) contains the Dram Interleave list register, we can't get interleave mode from the reg, while the variable interleave_mode saves the the mode got from the Dram Rule register, so we use the variable to replace INTERLEAVE_MDDE(reg) here. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:02 -03:00
Mauro Carvalho Chehab	5b889e379f	Fix sb_edac compilation with 32 bits kernels As reported by Josh Boyer <jwboyer@redhat.com>: > drivers/edac/sb_edac.c: In function 'get_memory_error_data': > drivers/edac/sb_edac.c:861:2: warning: left shift count >= width of type > [enabled by default] > <snip> > ERROR: "__udivdi3" [drivers/edac/sb_edac.ko] undefined! > make[1]: * [__modpost] Error 1 > make: * [modules] Error 2 PS.: compile-tested only Reported-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:19:38 -03:00
Lionel Debroux	36c46f31df	EDAC: Make pci_device_id tables __devinitconst. These const tables are currently marked __devinitdata, but Documentation/PCI/pci.txt says: "o The ID table array should be marked __devinitconst; this is done automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()." So use DEFINE_PCI_DEVICE_TABLE(x). Based on PaX and earlier work by Andi Kleen. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-03-19 12:04:54 +01:00
Linus Torvalds	edf7c8148e	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: add IRQ context simulation in module mce-inject x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog x86, MCE: Drain mcelog buffer x86, mce: Add wrappers for registering on the decode chain	2012-01-06 15:02:37 -08:00
Kevin Winchester	141168c36c	x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86' Several fields in struct cpuinfo_x86 were not defined for the !SMP case, likely to save space. However, those fields still have some meaning for UP, and keeping them allows some #ifdef removal from other files. The additional size of the UP kernel from this change is not significant enough to worry about keeping up the distinction: text data bss dec hex filename 4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before 4737444 506459 972040 6215943 5ed907 vmlinux.o.after for a difference of 276 bytes for an example UP config. If someone wants those 276 bytes back badly then it should be implemented in a cleaner way. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: Steffen Persvold <sp@numascale.com> Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-21 09:25:09 +01:00
Borislav Petkov	3653ada5d3	x86, mce: Add wrappers for registering on the decode chain No functionality change, this is done so that in a follow-on patch all queued-up MCEs can be decoded after registering on the chain. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:50:12 +01:00
Mark A. Grondona	c6e13b528f	EDAC: Fix incorrect edac mode reporting in sb_edac The edac driver for Sandy Bridge was found to be reporting "FPM" for edac_mode, which clearly doesn't make sense. It was found that sb_edac.c:get_dimm_config was reusing a variable for both mem_type and edac_type, and thus was overwriting the value after setting it correctly. This patch fixes that issue. Before the patch: /sys/devices/system/edac/mc/mc0/csrow0/edac_mode:FPM /sys/devices/system/edac/mc/mc0/csrow1/edac_mode:FPM /sys/devices/system/edac/mc/mc0/csrow2/edac_mode:FPM /sys/devices/system/edac/mc/mc0/csrow3/edac_mode:FPM After: /sys/devices/system/edac/mc/mc0/csrow0/edac_mode:S4ECD4ED /sys/devices/system/edac/mc/mc0/csrow1/edac_mode:S4ECD4ED /sys/devices/system/edac/mc/mc0/csrow2/edac_mode:S4ECD4ED /sys/devices/system/edac/mc/mc0/csrow3/edac_mode:S4ECD4ED Signed-off-by: Mark A. Grondona <mgrondona@llnl.gov> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-11-01 10:01:55 -02:00
Mauro Carvalho Chehab	3d78c9af78	edac: sb_edac: Add it to the building system Some changes on it were required due to changeset cd90cc84c6bf0, that changed the glue with the MCE logic. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-11-01 10:01:54 -02:00
Mauro Carvalho Chehab	eebf11a016	edac: Add an experimental new driver to support Sandy Bridge CPU's This driver is known to work on mine and Tony's test environments, using software error injection, and a partial hardware/software error injection tool. There's no broader range test yet to double check if the error decoding logic will actually point to the right DIMM, so use it with care. More tests are required to be sure that the driver will work on all different types of memory configurations. If you're willing to risk using it, I suggest you to enable EDAC debugs for your test machines, as the debug logs helps to track what's going inside the driver. Please feed me with bug reports, if you notice that the driver is miss-behaving. Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-11-01 10:01:53 -02:00

48 Commits