re PR target/54222 ([avr] Implement fixed-point support)

libgcc/ PR target/54222 * config/avr/lib1funcs-fixed.S: New file. * config/avr/lib1funcs.S: Include it. Undefine some divmodsi after they are used. (neg2, neg4): New macros. (__mulqihi3,__umulqihi3,__mulhi3): Rewrite non-MUL variants. (__mulhisi3,__umulhisi3,__mulsi3): Rewrite non-MUL variants. (__umulhisi3): Speed up MUL variant if there is enough flash. * config/avr/avr-lib.h (TA, UTA): Adjust according to gcc's avr-modes.def. * config/avr/t-avr (LIB1ASMFUNCS): Add: _fractqqsf, _fractuqqsf, _fracthqsf, _fractuhqsf, _fracthasf, _fractuhasf, _fractsasf, _fractusasf, _fractsfqq, _fractsfuqq, _fractsfhq, _fractsfuhq, _fractsfha, _fractsfsa, _mulqq3, _muluqq3, _mulhq3, _muluhq3, _mulha3, _muluha3, _mulsa3, _mulusa3, _divqq3, _udivuqq3, _divhq3, _udivuhq3, _divha3, _udivuha3, _divsa3, _udivusa3. (LIB2FUNCS_EXCLUDE): Add supported functions. gcc/ PR target/54222 * avr-modes.def (HA, SA, DA, TA, UTA): Adjust modes. * avr/avr-fixed.md: New file. * avr/avr.md: Include it. (cc): Add: minus. (adjust_len): Add: minus, minus64, ufract, sfract. (ALL1, ALL2, ALL4, ORDERED234): New mode iterators. (MOVMODE): Add: QQ, UQQ, HQ, UHQ, HA, UHA, SQ, USQ, SA, USA. (MPUSH): Add: HQ, UHQ, HA, UHA, SQ, USQ, SA, USA. (pushqi1, xload8_A, xload_8, movqi_insn, *reload_inqi, addqi3, subqi3, ashlqi3, *ashlqi3, ashrqi3, lshrqi3, *lshrqi3, *cmpqi, cbranchqi4, *cpse.eq): Generalize to handle all 8-bit modes in ALL1. (*movhi, reload_inhi, addhi3, *addhi3, addhi3_clobber, subhi3, ashlhi3, *ashlhi3_const, ashrhi3, *ashirhi3_const, lshrhi3, *lshrhi3_const, *cmphi, cbranchhi4): Generalize to handle all 16-bit modes in ALL2. (subhi3, casesi, strlenhi): Add clobber when expanding minus:HI. (*movsi, *reload_insi, addsi3, subsi3, ashlsi3, *ashlsi3_const, ashrsi3, *ashrhi3_const, *ashrsi3_const, lshrsi3, *lshrsi3_const, *reversed_tstsi, *cmpsi, cbranchsi4): Generalize to handle all 32-bit modes in ALL4. * avr-dimode.md (ALL8): New mode iterator. (adddi3, adddi3_insn, adddi3_const_insn, subdi3, subdi3_insn, subdi3_const_insn, cbranchdi4, compare_di2, compare_const_di2, ashrdi3, lshrdi3, rotldi3, ashldi3_insn, ashrdi3_insn, lshrdi3_insn, rotldi3_insn): Generalize to handle all 64-bit modes in ALL8. * config/avr/avr-protos.h (avr_to_int_mode): New prototype. (avr_out_fract, avr_out_minus, avr_out_minus64): New prototypes. * config/avr/avr.c (TARGET_FIXED_POINT_SUPPORTED_P): Define to... (avr_fixed_point_supported_p): ...this new static function. (TARGET_BUILD_BUILTIN_VA_LIST): Define to... (avr_build_builtin_va_list): ...this new static function. (avr_adjust_type_node): New static function. (avr_scalar_mode_supported_p): Allow if ALL_FIXED_POINT_MODE_P. (avr_builtin_setjmp_frame_value): Use gen_subhi3 and return new pseudo instead of gen_rtx_MINUS. (avr_print_operand, avr_operand_rtx_cost): Handle: CONST_FIXED. (notice_update_cc): Handle: CC_MINUS. (output_movqi): Generalize to handle respective fixed-point modes. (output_movhi, output_movsisf, avr_2word_insn_p): Ditto. (avr_out_compare, avr_out_plus_1): Also handle fixed-point modes. (avr_assemble_integer): Ditto. (output_reload_in_const, output_reload_insisf): Ditto. (avr_compare_pattern): Skip all modes > 4 bytes. (avr_2word_insn_p): Skip movuqq_insn, movqq_insn. (avr_out_fract, avr_out_minus, avr_out_minus64): New functions. (avr_to_int_mode): New function. (adjust_insn_length): Handle: ADJUST_LEN_SFRACT, ADJUST_LEN_UFRACT, ADJUST_LEN_MINUS, ADJUST_LEN_MINUS64. * config/avr/predicates.md (const0_operand): Allow const_fixed. (const_operand, const_or_immediate_operand): New. (nonmemory_or_const_operand): New. * config/avr/constraints.md (Ynn, Y00, Y01, Y02, Ym1, Ym2, YIJ): New constraints. * config/avr/avr.h (LONG_LONG_ACCUM_TYPE_SIZE): Define. From-SVN: r190644
2024-11-23 19:03:59 +08:00 · 2012-08-24 12:42:48 +00:00 · 2012-08-24 12:42:48 +00:00 · e55e405619
commit e55e405619
parent 2960a36853
15 changed files with 3046 additions and 677 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,62 @@
+2012-08-24  Georg-Johann Lay  <avr@gjlay.de>
+
+	PR target/54222
+	* avr-modes.def (HA, SA, DA, TA, UTA): Adjust modes.
+	* avr/avr-fixed.md: New file.
+	* avr/avr.md: Include it.
+	(cc): Add: minus.
+	(adjust_len): Add: minus, minus64, ufract, sfract.
+	(ALL1, ALL2, ALL4, ORDERED234): New mode iterators.
+	(MOVMODE): Add: QQ, UQQ, HQ, UHQ, HA, UHA, SQ, USQ, SA, USA.
+	(MPUSH): Add: HQ, UHQ, HA, UHA, SQ, USQ, SA, USA.
+	(pushqi1, xload8_A, xload_8, movqi_insn, *reload_inqi, addqi3,
+	subqi3, ashlqi3, *ashlqi3, ashrqi3, lshrqi3, *lshrqi3, *cmpqi, 
+	cbranchqi4, *cpse.eq): Generalize to handle all 8-bit modes in ALL1.
+	(*movhi, reload_inhi, addhi3, *addhi3, addhi3_clobber, subhi3,
+	ashlhi3, *ashlhi3_const, ashrhi3, *ashirhi3_const, lshrhi3,
+	*lshrhi3_const, *cmphi, cbranchhi4): Generalize to handle all
+	16-bit modes in ALL2.
+	(subhi3, casesi, strlenhi): Add clobber when expanding minus:HI.
+	(*movsi, *reload_insi, addsi3, subsi3, ashlsi3, *ashlsi3_const,
+	ashrsi3, *ashrhi3_const, *ashrsi3_const, lshrsi3, *lshrsi3_const,
+	*reversed_tstsi, *cmpsi, cbranchsi4): Generalize to handle all
+	32-bit modes in ALL4.
+	* avr-dimode.md (ALL8): New mode iterator.
+	(adddi3, adddi3_insn, adddi3_const_insn, subdi3, subdi3_insn,
+	subdi3_const_insn, cbranchdi4, compare_di2,
+	compare_const_di2, ashrdi3, lshrdi3, rotldi3, ashldi3_insn,
+	ashrdi3_insn, lshrdi3_insn, rotldi3_insn): Generalize to handle
+	all 64-bit modes in ALL8.
+	* config/avr/avr-protos.h (avr_to_int_mode): New prototype.
+	(avr_out_fract, avr_out_minus, avr_out_minus64): New prototypes.
+	* config/avr/avr.c (TARGET_FIXED_POINT_SUPPORTED_P): Define to...
+	(avr_fixed_point_supported_p): ...this new static function.
+	(TARGET_BUILD_BUILTIN_VA_LIST): Define to...
+	(avr_build_builtin_va_list): ...this new static function.
+	(avr_adjust_type_node): New static function.
+	(avr_scalar_mode_supported_p): Allow if ALL_FIXED_POINT_MODE_P.
+	(avr_builtin_setjmp_frame_value): Use gen_subhi3 and return new
+	pseudo instead of gen_rtx_MINUS.
+	(avr_print_operand, avr_operand_rtx_cost): Handle: CONST_FIXED.
+	(notice_update_cc): Handle: CC_MINUS.
+	(output_movqi): Generalize to handle respective fixed-point modes.
+	(output_movhi, output_movsisf, avr_2word_insn_p): Ditto.
+	(avr_out_compare, avr_out_plus_1): Also handle fixed-point modes.
+	(avr_assemble_integer): Ditto.
+	(output_reload_in_const, output_reload_insisf): Ditto.
+	(avr_compare_pattern): Skip all modes > 4 bytes.
+	(avr_2word_insn_p): Skip movuqq_insn, movqq_insn.
+	(avr_out_fract, avr_out_minus, avr_out_minus64): New functions.
+	(avr_to_int_mode): New function.
+	(adjust_insn_length): Handle: ADJUST_LEN_SFRACT,
+	ADJUST_LEN_UFRACT, ADJUST_LEN_MINUS, ADJUST_LEN_MINUS64.
+	* config/avr/predicates.md (const0_operand): Allow const_fixed.
+	(const_operand, const_or_immediate_operand): New.
+	(nonmemory_or_const_operand): New.
+	* config/avr/constraints.md (Ynn, Y00, Y01, Y02, Ym1, Ym2, YIJ):
+	New constraints.
+	* config/avr/avr.h (LONG_LONG_ACCUM_TYPE_SIZE): Define.
+
 2012-08-23  Kenneth Zadeck <zadeck@naturalbridge.com>

 	* alias.c (rtx_equal_for_memref_p): Convert constant cases.
--- a/gcc/config/avr/avr-dimode.md
+++ b/gcc/config/avr/avr-dimode.md
@ -47,44 +47,58 @@
  [(ACC_A	18)
   (ACC_B	10)])

+;; Supported modes that are 8 bytes wide
+(define_mode_iterator ALL8 [(DI "")
+                            (DQ "") (UDQ "")
+                            (DA "") (UDA "")
+                            (TA "") (UTA "")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;; Addition
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

-(define_expand "adddi3"
-  [(parallel [(match_operand:DI 0 "general_operand" "")
-              (match_operand:DI 1 "general_operand" "")
-              (match_operand:DI 2 "general_operand" "")])]
+;; "adddi3"
+;; "adddq3" "addudq3"
+;; "addda3" "adduda3"
+;; "addta3" "adduta3"
+(define_expand "add<mode>3"
+  [(parallel [(match_operand:ALL8 0 "general_operand" "")
+              (match_operand:ALL8 1 "general_operand" "")
+              (match_operand:ALL8 2 "general_operand" "")])]
  "avr_have_dimode"
  {
-    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+    rtx acc_a = gen_rtx_REG (<MODE>mode, ACC_A);

    emit_move_insn (acc_a, operands[1]);

-    if (s8_operand (operands[2], VOIDmode))
+    if (DImode == <MODE>mode
+        && s8_operand (operands[2], VOIDmode))
      {
        emit_move_insn (gen_rtx_REG (QImode, REG_X), operands[2]);
        emit_insn (gen_adddi3_const8_insn ());
      }        
-    else if (CONST_INT_P (operands[2])
-             || CONST_DOUBLE_P (operands[2]))
+    else if (const_operand (operands[2], GET_MODE (operands[2])))
      {
-        emit_insn (gen_adddi3_const_insn (operands[2]));
+        emit_insn (gen_add<mode>3_const_insn (operands[2]));
      }
    else
      {
-        emit_move_insn (gen_rtx_REG (DImode, ACC_B), operands[2]);
-        emit_insn (gen_adddi3_insn ());
+        emit_move_insn (gen_rtx_REG (<MODE>mode, ACC_B), operands[2]);
+        emit_insn (gen_add<mode>3_insn ());
      }

    emit_move_insn (operands[0], acc_a);
    DONE;
  })

-(define_insn "adddi3_insn"
-  [(set (reg:DI ACC_A)
-        (plus:DI (reg:DI ACC_A)
-                 (reg:DI ACC_B)))]
+;; "adddi3_insn"
+;; "adddq3_insn" "addudq3_insn"
+;; "addda3_insn" "adduda3_insn"
+;; "addta3_insn" "adduta3_insn"
+(define_insn "add<mode>3_insn"
+  [(set (reg:ALL8 ACC_A)
+        (plus:ALL8 (reg:ALL8 ACC_A)
+                   (reg:ALL8 ACC_B)))]
  "avr_have_dimode"
  "%~call __adddi3"
  [(set_attr "adjust_len" "call")
@ -99,10 +113,14 @@
  [(set_attr "adjust_len" "call")
   (set_attr "cc" "clobber")])

-(define_insn "adddi3_const_insn"
-  [(set (reg:DI ACC_A)
-        (plus:DI (reg:DI ACC_A)
-                 (match_operand:DI 0 "const_double_operand" "n")))]
+;; "adddi3_const_insn"
+;; "adddq3_const_insn" "addudq3_const_insn"
+;; "addda3_const_insn" "adduda3_const_insn"
+;; "addta3_const_insn" "adduta3_const_insn"
+(define_insn "add<mode>3_const_insn"
+  [(set (reg:ALL8 ACC_A)
+        (plus:ALL8 (reg:ALL8 ACC_A)
+                   (match_operand:ALL8 0 "const_operand" "n Ynn")))]
  "avr_have_dimode
   && !s8_operand (operands[0], VOIDmode)"
  {
@ -116,30 +134,62 @@
 ;; Subtraction
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

-(define_expand "subdi3"
-  [(parallel [(match_operand:DI 0 "general_operand" "")
-              (match_operand:DI 1 "general_operand" "")
-              (match_operand:DI 2 "general_operand" "")])]
+;; "subdi3"
+;; "subdq3" "subudq3"
+;; "subda3" "subuda3"
+;; "subta3" "subuta3"
+(define_expand "sub<mode>3"
+  [(parallel [(match_operand:ALL8 0 "general_operand" "")
+              (match_operand:ALL8 1 "general_operand" "")
+              (match_operand:ALL8 2 "general_operand" "")])]
  "avr_have_dimode"
  {
-    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+    rtx acc_a = gen_rtx_REG (<MODE>mode, ACC_A);

    emit_move_insn (acc_a, operands[1]);
-    emit_move_insn (gen_rtx_REG (DImode, ACC_B), operands[2]);
-    emit_insn (gen_subdi3_insn ());
+
+    if (const_operand (operands[2], GET_MODE (operands[2])))
+      {
+        emit_insn (gen_sub<mode>3_const_insn (operands[2]));
+      }
+    else
+     {
+       emit_move_insn (gen_rtx_REG (<MODE>mode, ACC_B), operands[2]);
+       emit_insn (gen_sub<mode>3_insn ());
+     }
+
    emit_move_insn (operands[0], acc_a);
    DONE;
  })

-(define_insn "subdi3_insn"
-  [(set (reg:DI ACC_A)
-        (minus:DI (reg:DI ACC_A)
-                  (reg:DI ACC_B)))]
+;; "subdi3_insn"
+;; "subdq3_insn" "subudq3_insn"
+;; "subda3_insn" "subuda3_insn"
+;; "subta3_insn" "subuta3_insn"
+(define_insn "sub<mode>3_insn"
+  [(set (reg:ALL8 ACC_A)
+        (minus:ALL8 (reg:ALL8 ACC_A)
+                    (reg:ALL8 ACC_B)))]
  "avr_have_dimode"
  "%~call __subdi3"
  [(set_attr "adjust_len" "call")
   (set_attr "cc" "set_czn")])

+;; "subdi3_const_insn"
+;; "subdq3_const_insn" "subudq3_const_insn"
+;; "subda3_const_insn" "subuda3_const_insn"
+;; "subta3_const_insn" "subuta3_const_insn"
+(define_insn "sub<mode>3_const_insn"
+  [(set (reg:ALL8 ACC_A)
+        (minus:ALL8 (reg:ALL8 ACC_A)
+                    (match_operand:ALL8 0 "const_operand" "n Ynn")))]
+  "avr_have_dimode"
+  {
+    return avr_out_minus64 (operands[0], NULL);
+  }
+  [(set_attr "adjust_len" "minus64")
+   (set_attr "cc" "clobber")])
+

 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;; Negation
@ -180,15 +230,19 @@
         (pc)))]
  "avr_have_dimode")

-(define_expand "cbranchdi4"
-  [(parallel [(match_operand:DI 1 "register_operand" "")
-              (match_operand:DI 2 "nonmemory_operand" "")
+;; "cbranchdi4"
+;; "cbranchdq4" "cbranchudq4"
+;; "cbranchda4" "cbranchuda4"
+;; "cbranchta4" "cbranchuta4"
+(define_expand "cbranch<mode>4"
+  [(parallel [(match_operand:ALL8 1 "register_operand" "")
+              (match_operand:ALL8 2 "nonmemory_operand" "")
              (match_operator 0 "ordered_comparison_operator" [(cc0)
                                                               (const_int 0)])
              (label_ref (match_operand 3 "" ""))])]
  "avr_have_dimode"
  {
-    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+    rtx acc_a = gen_rtx_REG (<MODE>mode, ACC_A);

    emit_move_insn (acc_a, operands[1]);

@ -197,25 +251,28 @@
        emit_move_insn (gen_rtx_REG (QImode, REG_X), operands[2]);
        emit_insn (gen_compare_const8_di2 ());
      }        
-    else if (CONST_INT_P (operands[2])
-             || CONST_DOUBLE_P (operands[2]))
+    else if (const_operand (operands[2], GET_MODE (operands[2])))
      {
-        emit_insn (gen_compare_const_di2 (operands[2]));
+        emit_insn (gen_compare_const_<mode>2 (operands[2]));
      }
    else
      {
-        emit_move_insn (gen_rtx_REG (DImode, ACC_B), operands[2]);
-        emit_insn (gen_compare_di2 ());
+        emit_move_insn (gen_rtx_REG (<MODE>mode, ACC_B), operands[2]);
+        emit_insn (gen_compare_<mode>2 ());
      }

    emit_jump_insn (gen_conditional_jump (operands[0], operands[3]));
    DONE;
  })

-(define_insn "compare_di2"
+;; "compare_di2"
+;; "compare_dq2" "compare_udq2"
+;; "compare_da2" "compare_uda2"
+;; "compare_ta2" "compare_uta2"
+(define_insn "compare_<mode>2"
  [(set (cc0)
-        (compare (reg:DI ACC_A)
-                 (reg:DI ACC_B)))]
+        (compare (reg:ALL8 ACC_A)
+                 (reg:ALL8 ACC_B)))]
  "avr_have_dimode"
  "%~call __cmpdi2"
  [(set_attr "adjust_len" "call")
@ -230,10 +287,14 @@
  [(set_attr "adjust_len" "call")
   (set_attr "cc" "compare")])

-(define_insn "compare_const_di2"
+;; "compare_const_di2"
+;; "compare_const_dq2" "compare_const_udq2"
+;; "compare_const_da2" "compare_const_uda2"
+;; "compare_const_ta2" "compare_const_uta2"
+(define_insn "compare_const_<mode>2"
  [(set (cc0)
-        (compare (reg:DI ACC_A)
-                 (match_operand:DI 0 "const_double_operand" "n")))
+        (compare (reg:ALL8 ACC_A)
+                 (match_operand:ALL8 0 "const_operand" "n Ynn")))
   (clobber (match_scratch:QI 1 "=&d"))]
  "avr_have_dimode
   && !s8_operand (operands[0], VOIDmode)"
@ -254,29 +315,39 @@
 ;; Shift functions from libgcc are called without defining these insns,
 ;; but with them we can describe their reduced register footprint.

-;; "ashldi3"
-;; "ashrdi3"
-;; "lshrdi3"
-;; "rotldi3"
-(define_expand "<code_stdname>di3"
-  [(parallel [(match_operand:DI 0 "general_operand" "")
-              (di_shifts:DI (match_operand:DI 1 "general_operand" "")
-                            (match_operand:QI 2 "general_operand" ""))])]
+;; "ashldi3"   "ashrdi3"   "lshrdi3"   "rotldi3"
+;; "ashldq3"   "ashrdq3"   "lshrdq3"   "rotldq3"
+;; "ashlda3"   "ashrda3"   "lshrda3"   "rotlda3"
+;; "ashlta3"   "ashrta3"   "lshrta3"   "rotlta3"
+;; "ashludq3"  "ashrudq3"  "lshrudq3"  "rotludq3"
+;; "ashluda3"  "ashruda3"  "lshruda3"  "rotluda3"
+;; "ashluta3"  "ashruta3"  "lshruta3"  "rotluta3"
+(define_expand "<code_stdname><mode>3"
+  [(parallel [(match_operand:ALL8 0 "general_operand" "")
+              (di_shifts:ALL8 (match_operand:ALL8 1 "general_operand" "")
+                              (match_operand:QI 2 "general_operand" ""))])]
  "avr_have_dimode"
  {
-    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+    rtx acc_a = gen_rtx_REG (<MODE>mode, ACC_A);

    emit_move_insn (acc_a, operands[1]);
    emit_move_insn (gen_rtx_REG (QImode, 16), operands[2]);
-    emit_insn (gen_<code_stdname>di3_insn ());
+    emit_insn (gen_<code_stdname><mode>3_insn ());
    emit_move_insn (operands[0], acc_a);
    DONE;
  })

-(define_insn "<code_stdname>di3_insn"
-  [(set (reg:DI ACC_A)
-        (di_shifts:DI (reg:DI ACC_A)
-                      (reg:QI 16)))]
+;; "ashldi3_insn"   "ashrdi3_insn"   "lshrdi3_insn"   "rotldi3_insn"
+;; "ashldq3_insn"   "ashrdq3_insn"   "lshrdq3_insn"   "rotldq3_insn"
+;; "ashlda3_insn"   "ashrda3_insn"   "lshrda3_insn"   "rotlda3_insn"
+;; "ashlta3_insn"   "ashrta3_insn"   "lshrta3_insn"   "rotlta3_insn"
+;; "ashludq3_insn"  "ashrudq3_insn"  "lshrudq3_insn"  "rotludq3_insn"
+;; "ashluda3_insn"  "ashruda3_insn"  "lshruda3_insn"  "rotluda3_insn"
+;; "ashluta3_insn"  "ashruta3_insn"  "lshruta3_insn"  "rotluta3_insn"
+(define_insn "<code_stdname><mode>3_insn"
+  [(set (reg:ALL8 ACC_A)
+        (di_shifts:ALL8 (reg:ALL8 ACC_A)
+                        (reg:QI 16)))]
  "avr_have_dimode"
  "%~call __<code_stdname>di3"
  [(set_attr "adjust_len" "call")
--- a/gcc/config/avr/avr-fixed.md
+++ b/gcc/config/avr/avr-fixed.md
@ -0,0 +1,287 @@
+;;   This file contains instructions that support fixed-point operations
+;;   for Atmel AVR micro controllers.
+;;   Copyright (C) 2012
+;;   Free Software Foundation, Inc.
+;;
+;;   Contributed by Sean D'Epagnier  (sean@depagnier.com)
+;;                  Georg-Johann Lay (avr@gjlay.de)
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_mode_iterator ALL1Q [(QQ "") (UQQ "")])
+(define_mode_iterator ALL2Q [(HQ "") (UHQ "")])
+(define_mode_iterator ALL2A [(HA "") (UHA "")])
+(define_mode_iterator ALL2QA [(HQ "") (UHQ "")
+                              (HA "") (UHA "")])
+(define_mode_iterator ALL4A [(SA "") (USA "")])
+
+;;; Conversions
+
+(define_mode_iterator FIXED_A
+  [(QQ "") (UQQ "")
+   (HQ "") (UHQ "") (HA "") (UHA "")
+   (SQ "") (USQ "") (SA "") (USA "")
+   (DQ "") (UDQ "") (DA "") (UDA "")
+   (TA "") (UTA "")
+   (QI "") (HI "") (SI "") (DI "")])
+
+;; Same so that be can build cross products
+
+(define_mode_iterator FIXED_B
+  [(QQ "") (UQQ "")
+   (HQ "") (UHQ "") (HA "") (UHA "")
+   (SQ "") (USQ "") (SA "") (USA "")
+   (DQ "") (UDQ "") (DA "") (UDA "")
+   (TA "") (UTA "")
+   (QI "") (HI "") (SI "") (DI "")])
+
+(define_insn "fract<FIXED_B:mode><FIXED_A:mode>2"
+  [(set (match_operand:FIXED_A 0 "register_operand" "=r")
+        (fract_convert:FIXED_A
+         (match_operand:FIXED_B 1 "register_operand" "r")))]
+  "<FIXED_B:MODE>mode != <FIXED_A:MODE>mode"
+  {
+    return avr_out_fract (insn, operands, true, NULL);
+  }
+  [(set_attr "cc" "clobber")
+   (set_attr "adjust_len" "sfract")])
+
+(define_insn "fractuns<FIXED_B:mode><FIXED_A:mode>2"
+  [(set (match_operand:FIXED_A 0 "register_operand" "=r")
+        (unsigned_fract_convert:FIXED_A
+         (match_operand:FIXED_B 1 "register_operand" "r")))]
+  "<FIXED_B:MODE>mode != <FIXED_A:MODE>mode"
+  {
+    return avr_out_fract (insn, operands, false, NULL);
+  }
+  [(set_attr "cc" "clobber")
+   (set_attr "adjust_len" "ufract")])
+
+;******************************************************************************
+; mul
+
+;; "mulqq3" "muluqq3"
+(define_expand "mul<mode>3"
+  [(parallel [(match_operand:ALL1Q 0 "register_operand" "")
+              (match_operand:ALL1Q 1 "register_operand" "")
+              (match_operand:ALL1Q 2 "register_operand" "")])]
+  ""
+  {
+    emit_insn (AVR_HAVE_MUL
+      ? gen_mul<mode>3_enh (operands[0], operands[1], operands[2])
+      : gen_mul<mode>3_nomul (operands[0], operands[1], operands[2]));
+    DONE;
+  })
+
+(define_insn "mulqq3_enh"
+  [(set (match_operand:QQ 0 "register_operand"         "=r")
+        (mult:QQ (match_operand:QQ 1 "register_operand" "a")
+                 (match_operand:QQ 2 "register_operand" "a")))]
+  "AVR_HAVE_MUL"
+  "fmuls %1,%2\;dec r1\;brvs 0f\;inc r1\;0:\;mov %0,r1\;clr __zero_reg__"
+  [(set_attr "length" "6")
+   (set_attr "cc" "clobber")])
+
+(define_insn "muluqq3_enh"
+  [(set (match_operand:UQQ 0 "register_operand"          "=r")
+        (mult:UQQ (match_operand:UQQ 1 "register_operand" "r")
+                  (match_operand:UQQ 2 "register_operand" "r")))]
+  "AVR_HAVE_MUL"
+  "mul %1,%2\;mov %0,r1\;clr __zero_reg__"
+  [(set_attr "length" "3")
+   (set_attr "cc" "clobber")])
+
+(define_expand "mulqq3_nomul"
+  [(set (reg:QQ 24)
+        (match_operand:QQ 1 "register_operand" ""))
+   (set (reg:QQ 25)
+        (match_operand:QQ 2 "register_operand" ""))
+   ;; "*mulqq3.call"
+   (parallel [(set (reg:QQ 23)
+                   (mult:QQ (reg:QQ 24)
+                            (reg:QQ 25)))
+              (clobber (reg:QI 22))
+              (clobber (reg:HI 24))])
+   (set (match_operand:QQ 0 "register_operand" "")
+        (reg:QQ 23))]
+  "!AVR_HAVE_MUL")
+
+(define_expand "muluqq3_nomul"
+  [(set (reg:UQQ 22)
+        (match_operand:UQQ 1 "register_operand" ""))
+   (set (reg:UQQ 24)
+        (match_operand:UQQ 2 "register_operand" ""))
+   ;; "*umulqihi3.call"
+   (parallel [(set (reg:HI 24)
+                   (mult:HI (zero_extend:HI (reg:QI 22))
+                            (zero_extend:HI (reg:QI 24))))
+              (clobber (reg:QI 21))
+              (clobber (reg:HI 22))])
+   (set (match_operand:UQQ 0 "register_operand" "")
+        (reg:UQQ 25))]
+  "!AVR_HAVE_MUL")
+
+(define_insn "*mulqq3.call"
+  [(set (reg:QQ 23)
+        (mult:QQ (reg:QQ 24)
+                 (reg:QQ 25)))
+   (clobber (reg:QI 22))
+   (clobber (reg:HI 24))]
+  "!AVR_HAVE_MUL"
+  "%~call __mulqq3"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+
+;; "mulhq3" "muluhq3"
+;; "mulha3" "muluha3"
+(define_expand "mul<mode>3"
+  [(set (reg:ALL2QA 18)
+        (match_operand:ALL2QA 1 "register_operand" ""))
+   (set (reg:ALL2QA 26)
+        (match_operand:ALL2QA 2 "register_operand" ""))
+   ;; "*mulhq3.call.enh"
+   (parallel [(set (reg:ALL2QA 24)
+                   (mult:ALL2QA (reg:ALL2QA 18)
+                                (reg:ALL2QA 26)))
+              (clobber (reg:HI 22))])
+   (set (match_operand:ALL2QA 0 "register_operand" "")
+        (reg:ALL2QA 24))]
+  "AVR_HAVE_MUL")
+
+;; "*mulhq3.call"  "*muluhq3.call"
+;; "*mulha3.call"  "*muluha3.call"
+(define_insn "*mul<mode>3.call"
+  [(set (reg:ALL2QA 24)
+        (mult:ALL2QA (reg:ALL2QA 18)
+                     (reg:ALL2QA 26)))
+   (clobber (reg:HI 22))]
+  "AVR_HAVE_MUL"
+  "%~call __mul<mode>3"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+
+;; On the enhanced core, don't clobber either input and use a separate output
+
+;; "mulsa3" "mulusa3"
+(define_expand "mul<mode>3"
+  [(set (reg:ALL4A 16)
+        (match_operand:ALL4A 1 "register_operand" ""))
+   (set (reg:ALL4A 20)
+        (match_operand:ALL4A 2 "register_operand" ""))
+   (set (reg:ALL4A 24)
+        (mult:ALL4A (reg:ALL4A 16)
+                    (reg:ALL4A 20)))
+   (set (match_operand:ALL4A 0 "register_operand" "")
+        (reg:ALL4A 24))]
+  "AVR_HAVE_MUL")
+
+;; "*mulsa3.call" "*mulusa3.call"
+(define_insn "*mul<mode>3.call"
+  [(set (reg:ALL4A 24)
+        (mult:ALL4A (reg:ALL4A 16)
+                    (reg:ALL4A 20)))]
+  "AVR_HAVE_MUL"
+  "%~call __mul<mode>3"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+; / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
+; div
+
+(define_code_iterator usdiv [udiv div])
+
+;; "divqq3" "udivuqq3"
+(define_expand "<code><mode>3"
+  [(set (reg:ALL1Q 25)
+        (match_operand:ALL1Q 1 "register_operand" ""))
+   (set (reg:ALL1Q 22)
+        (match_operand:ALL1Q 2 "register_operand" ""))
+   (parallel [(set (reg:ALL1Q 24)
+                   (usdiv:ALL1Q (reg:ALL1Q 25)
+                                (reg:ALL1Q 22)))
+              (clobber (reg:QI 25))])
+   (set (match_operand:ALL1Q 0 "register_operand" "")
+        (reg:ALL1Q 24))])
+
+;; "*divqq3.call" "*udivuqq3.call"
+(define_insn "*<code><mode>3.call"
+  [(set (reg:ALL1Q 24)
+        (usdiv:ALL1Q (reg:ALL1Q 25)
+                     (reg:ALL1Q 22)))
+   (clobber (reg:QI 25))]
+  ""
+  "%~call __<code><mode>3"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; "divhq3" "udivuhq3"
+;; "divha3" "udivuha3"
+(define_expand "<code><mode>3"
+  [(set (reg:ALL2QA 26)
+        (match_operand:ALL2QA 1 "register_operand" ""))
+   (set (reg:ALL2QA 22)
+        (match_operand:ALL2QA 2 "register_operand" ""))
+   (parallel [(set (reg:ALL2QA 24)
+                   (usdiv:ALL2QA (reg:ALL2QA 26)
+                                 (reg:ALL2QA 22)))
+              (clobber (reg:HI 26))
+              (clobber (reg:QI 21))])
+   (set (match_operand:ALL2QA 0 "register_operand" "")
+        (reg:ALL2QA 24))])
+
+;; "*divhq3.call" "*udivuhq3.call"
+;; "*divha3.call" "*udivuha3.call"
+(define_insn "*<code><mode>3.call"
+  [(set (reg:ALL2QA 24)
+        (usdiv:ALL2QA (reg:ALL2QA 26)
+                      (reg:ALL2QA 22)))
+   (clobber (reg:HI 26))
+   (clobber (reg:QI 21))]
+  ""
+  "%~call __<code><mode>3"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; Note the first parameter gets passed in already offset by 2 bytes
+
+;; "divsa3" "udivusa3"
+(define_expand "<code><mode>3"
+  [(set (reg:ALL4A 24)
+        (match_operand:ALL4A 1 "register_operand" ""))
+   (set (reg:ALL4A 18)
+        (match_operand:ALL4A 2 "register_operand" ""))
+   (parallel [(set (reg:ALL4A 22)
+                   (usdiv:ALL4A (reg:ALL4A 24)
+                                (reg:ALL4A 18)))
+              (clobber (reg:HI 26))
+              (clobber (reg:HI 30))])
+   (set (match_operand:ALL4A 0 "register_operand" "")
+        (reg:ALL4A 22))])
+
+;; "*divsa3.call" "*udivusa3.call"
+(define_insn "*<code><mode>3.call"
+  [(set (reg:ALL4A 22)
+        (usdiv:ALL4A (reg:ALL4A 24)
+                     (reg:ALL4A 18)))
+   (clobber (reg:HI 26))
+   (clobber (reg:HI 30))]
+  ""
+  "%~call __<code><mode>3"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
--- a/gcc/config/avr/avr-modes.def
+++ b/gcc/config/avr/avr-modes.def
@ -1 +1,28 @@
 FRACTIONAL_INT_MODE (PSI, 24, 3);
+
+/* On 8 bit machines it requires fewer instructions for fixed point
+   routines if the decimal place is on a byte boundary which is not
+   the default for signed accum types.  */
+
+ADJUST_IBIT (HA, 7);
+ADJUST_FBIT (HA, 8);
+
+ADJUST_IBIT (SA, 15);
+ADJUST_FBIT (SA, 16);
+
+ADJUST_IBIT (DA, 31);
+ADJUST_FBIT (DA, 32);
+
+/* Make TA and UTA 64 bits wide.
+   128 bit wide modes would be insane on a 8-bit machine.
+   This needs special treatment in avr.c and avr-lib.h.  */
+
+ADJUST_BYTESIZE  (TA, 8);
+ADJUST_ALIGNMENT (TA, 1);
+ADJUST_IBIT (TA, 15);
+ADJUST_FBIT (TA, 48);
+
+ADJUST_BYTESIZE  (UTA, 8);
+ADJUST_ALIGNMENT (UTA, 1);
+ADJUST_IBIT (UTA, 16);
+ADJUST_FBIT (UTA, 48);
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@ -79,6 +79,9 @@ extern const char* avr_load_lpm (rtx, rtx*, int*);

 extern bool avr_rotate_bytes (rtx operands[]);

+extern const char* avr_out_fract (rtx, rtx[], bool, int*);
+extern rtx avr_to_int_mode (rtx);
+
 extern void expand_prologue (void);
 extern void expand_epilogue (bool);
 extern bool avr_emit_movmemhi (rtx*);
@ -92,6 +95,8 @@ extern const char* avr_out_plus (rtx*, int*, int*);
 extern const char* avr_out_plus_noclobber (rtx*, int*, int*);
 extern const char* avr_out_plus64 (rtx, int*);
 extern const char* avr_out_addto_sp (rtx*, int*);
+extern const char* avr_out_minus (rtx*, int*, int*);
+extern const char* avr_out_minus64 (rtx, int*);
 extern const char* avr_out_xload (rtx, rtx*, int*);
 extern const char* avr_out_movmem (rtx, rtx*, int*);
 extern const char* avr_out_insert_bits (rtx*, int*);
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@ -49,6 +49,10 @@
 #include "params.h"
 #include "df.h"

+#ifndef CONST_FIXED_P
+#define CONST_FIXED_P(X) (CONST_FIXED == GET_CODE (X))
+#endif
+
 /* Maximal allowed offset for an address in the LD command */
 #define MAX_LD_OFFSET(MODE) (64 - (signed)GET_MODE_SIZE (MODE))

@ -264,6 +268,23 @@ avr_popcount_each_byte (rtx xval, int n_bytes, int pop_mask)
  return true;
 }

+
+/* Access some RTX as INT_MODE.  If X is a CONST_FIXED we can get
+   the bit representation of X by "casting" it to CONST_INT.  */
+
+rtx
+avr_to_int_mode (rtx x)
+{
+  enum machine_mode mode = GET_MODE (x);
+
+  return VOIDmode == mode
+    ? x
+    : simplify_gen_subreg (int_mode_for_mode (mode), x, mode, 0);
+}
+
+
+/* Implement `TARGET_OPTION_OVERRIDE'.  */
+
 static void
 avr_option_override (void)
 {
@ -389,9 +410,14 @@ avr_regno_reg_class (int r)
 }


+/* Implement `TARGET_SCALAR_MODE_SUPPORTED_P'.  */
+
 static bool
 avr_scalar_mode_supported_p (enum machine_mode mode)
 {
+  if (ALL_FIXED_POINT_MODE_P (mode))
+    return true;
+
  if (PSImode == mode)
    return true;

@ -715,6 +741,58 @@ avr_initial_elimination_offset (int from, int to)
    }
 }

+
+/* Helper for the function below.  */
+
+static void
+avr_adjust_type_node (tree *node, enum machine_mode mode, int sat_p)
+{
+  *node = make_node (FIXED_POINT_TYPE);
+  TYPE_SATURATING (*node) = sat_p;
+  TYPE_UNSIGNED (*node) = UNSIGNED_FIXED_POINT_MODE_P (mode);
+  TYPE_IBIT (*node) = GET_MODE_IBIT (mode);
+  TYPE_FBIT (*node) = GET_MODE_FBIT (mode);
+  TYPE_PRECISION (*node) = GET_MODE_BITSIZE (mode);
+  TYPE_ALIGN (*node) = 8;
+  SET_TYPE_MODE (*node, mode);
+
+  layout_type (*node);
+}
+
+
+/* Implement `TARGET_BUILD_BUILTIN_VA_LIST'.  */
+
+static tree
+avr_build_builtin_va_list (void)
+{
+  /* avr-modes.def adjusts [U]TA to be 64-bit modes with 48 fractional bits.
+     This is more appropriate for the 8-bit machine AVR than 128-bit modes.
+     The ADJUST_IBIT/FBIT are handled in toplev:init_adjust_machine_modes()
+     which is auto-generated by genmodes, but the compiler assigns [U]DAmode
+     to the long long accum modes instead of the desired [U]TAmode.
+
+     Fix this now, right after node setup in tree.c:build_common_tree_nodes().
+     This must run before c-cppbuiltin.c:builtin_define_fixed_point_constants()
+     which built-in defines macros like __ULLACCUM_FBIT__ that are used by
+     libgcc to detect IBIT and FBIT.  */
+
+  avr_adjust_type_node (&ta_type_node, TAmode, 0);
+  avr_adjust_type_node (&uta_type_node, UTAmode, 0);
+  avr_adjust_type_node (&sat_ta_type_node, TAmode, 1);
+  avr_adjust_type_node (&sat_uta_type_node, UTAmode, 1);
+
+  unsigned_long_long_accum_type_node = uta_type_node;
+  long_long_accum_type_node = ta_type_node;
+  sat_unsigned_long_long_accum_type_node = sat_uta_type_node;
+  sat_long_long_accum_type_node = sat_ta_type_node;
+
+  /* Dispatch to the default handler.  */
+  
+  return std_build_builtin_va_list ();
+}
+
+
+/* Implement `TARGET_BUILTIN_SETJMP_FRAME_VALUE'.  */
 /* Actual start of frame is virtual_stack_vars_rtx this is offset from 
   frame pointer by +STARTING_FRAME_OFFSET.
   Using saved frame = virtual_stack_vars_rtx - STARTING_FRAME_OFFSET
@ -723,10 +801,13 @@ avr_initial_elimination_offset (int from, int to)
 static rtx
 avr_builtin_setjmp_frame_value (void)
 {
-  return gen_rtx_MINUS (Pmode, virtual_stack_vars_rtx, 
-                        gen_int_mode (STARTING_FRAME_OFFSET, Pmode));
+  rtx xval = gen_reg_rtx (Pmode);
+  emit_insn (gen_subhi3 (xval, virtual_stack_vars_rtx,
+                         gen_int_mode (STARTING_FRAME_OFFSET, Pmode)));
+  return xval;
 }

+
 /* Return contents of MEM at frame pointer + stack size + 1 (+2 if 3 byte PC).
   This is return address of function.  */
 rtx 
@ -1580,7 +1661,7 @@ avr_legitimate_address_p (enum machine_mode mode, rtx x, bool strict)
                                  MEM, strict);

      if (strict
-          && DImode == mode
+          && GET_MODE_SIZE (mode) > 4
          && REG_X == REGNO (x))
        {
          ok = false;
@ -2081,6 +2162,14 @@ avr_print_operand (FILE *file, rtx x, int code)
      /* Use normal symbol for direct address no linker trampoline needed */
      output_addr_const (file, x);
    }
+  else if (GET_CODE (x) == CONST_FIXED)
+    {
+      HOST_WIDE_INT ival = INTVAL (avr_to_int_mode (x));
+      if (code != 0)
+        output_operand_lossage ("Unsupported code '%c'for fixed-point:",
+                                code);
+      fprintf (file, HOST_WIDE_INT_PRINT_DEC, ival);
+    }
  else if (GET_CODE (x) == CONST_DOUBLE)
    {
      long val;
@ -2116,6 +2205,7 @@ notice_update_cc (rtx body ATTRIBUTE_UNUSED, rtx insn)

    case CC_OUT_PLUS:
    case CC_OUT_PLUS_NOCLOBBER:
+    case CC_MINUS:
    case CC_LDI:
      {
        rtx *op = recog_data.operand;
@ -2139,6 +2229,11 @@ notice_update_cc (rtx body ATTRIBUTE_UNUSED, rtx insn)
            cc = (enum attr_cc) icc;
            break;

+          case CC_MINUS:
+            avr_out_minus (op, &len_dummy, &icc);
+            cc = (enum attr_cc) icc;
+            break;
+
          case CC_LDI:

            cc = (op[1] == CONST0_RTX (GET_MODE (op[0]))
@ -2779,9 +2874,11 @@ output_movqi (rtx insn, rtx operands[], int *real_l)
  if (real_l)
    *real_l = 1;
  
-  if (register_operand (dest, QImode))
+  gcc_assert (1 == GET_MODE_SIZE (GET_MODE (dest)));
+
+  if (REG_P (dest))
    {
-      if (register_operand (src, QImode)) /* mov r,r */
+      if (REG_P (src)) /* mov r,r */
 	{
 	  if (test_hard_reg_class (STACK_REG, dest))
 	    return "out %0,%1";
@ -2803,7 +2900,7 @@ output_movqi (rtx insn, rtx operands[], int *real_l)
      rtx xop[2];

      xop[0] = dest;
-      xop[1] = src == const0_rtx ? zero_reg_rtx : src;
+      xop[1] = src == CONST0_RTX (GET_MODE (dest)) ? zero_reg_rtx : src;

      return out_movqi_mr_r (insn, xop, real_l);
    }
@ -2825,6 +2922,8 @@ output_movhi (rtx insn, rtx xop[], int *plen)
      return avr_out_lpm (insn, xop, plen);
    }

+  gcc_assert (2 == GET_MODE_SIZE (GET_MODE (dest)));
+
  if (REG_P (dest))
    {
      if (REG_P (src)) /* mov r,r */
@ -2843,7 +2942,6 @@ output_movhi (rtx insn, rtx xop[], int *plen)
              return TARGET_NO_INTERRUPTS
                ? avr_asm_len ("out __SP_H__,%B1" CR_TAB
                               "out __SP_L__,%A1", xop, plen, -2)
-
                : avr_asm_len ("in __tmp_reg__,__SREG__"  CR_TAB
                               "cli"                      CR_TAB
                               "out __SP_H__,%B1"         CR_TAB
@ -2880,7 +2978,7 @@ output_movhi (rtx insn, rtx xop[], int *plen)
      rtx xop[2];

      xop[0] = dest;
-      xop[1] = src == const0_rtx ? zero_reg_rtx : src;
+      xop[1] = src == CONST0_RTX (GET_MODE (dest)) ? zero_reg_rtx : src;

      return out_movhi_mr_r (insn, xop, plen);
    }
@ -3403,9 +3501,10 @@ output_movsisf (rtx insn, rtx operands[], int *l)
  if (!l)
    l = &dummy;
  
-  if (register_operand (dest, VOIDmode))
+  gcc_assert (4 == GET_MODE_SIZE (GET_MODE (dest)));
+  if (REG_P (dest))
    {
-      if (register_operand (src, VOIDmode)) /* mov r,r */
+      if (REG_P (src)) /* mov r,r */
 	{
 	  if (true_regnum (dest) > true_regnum (src))
 	    {
@ -3440,10 +3539,10 @@ output_movsisf (rtx insn, rtx operands[], int *l)
 	{
          return output_reload_insisf (operands, NULL_RTX, real_l);
        }
-      else if (GET_CODE (src) == MEM)
+      else if (MEM_P (src))
 	return out_movsi_r_mr (insn, operands, real_l); /* mov r,m */
    }
-  else if (GET_CODE (dest) == MEM)
+  else if (MEM_P (dest))
    {
      const char *templ;

@ -4126,14 +4225,25 @@ avr_out_compare (rtx insn, rtx *xop, int *plen)
  rtx xval = xop[1];
  
  /* MODE of the comparison.  */
-  enum machine_mode mode = GET_MODE (xreg);
+  enum machine_mode mode;

  /* Number of bytes to operate on.  */
-  int i, n_bytes = GET_MODE_SIZE (mode);
+  int i, n_bytes = GET_MODE_SIZE (GET_MODE (xreg));

  /* Value (0..0xff) held in clobber register xop[2] or -1 if unknown.  */
  int clobber_val = -1;

+  /* Map fixed mode operands to integer operands with the same binary
+     representation.  They are easier to handle in the remainder.  */
+
+  if (CONST_FIXED == GET_CODE (xval))
+    {
+      xreg = avr_to_int_mode (xop[0]);
+      xval = avr_to_int_mode (xop[1]);
+    }
+  
+  mode = GET_MODE (xreg);
+
  gcc_assert (REG_P (xreg));
  gcc_assert ((CONST_INT_P (xval) && n_bytes <= 4)
              || (const_double_operand (xval, VOIDmode) && n_bytes == 8));
@ -4143,7 +4253,7 @@ avr_out_compare (rtx insn, rtx *xop, int *plen)

  /* Comparisons == +/-1 and != +/-1 can be done similar to camparing
     against 0 by ORing the bytes.  This is one instruction shorter.
-     Notice that DImode comparisons are always against reg:DI 18
+     Notice that 64-bit comparisons are always against reg:ALL8 18 (ACC_A)
     and therefore don't use this.  */

  if (!test_hard_reg_class (LD_REGS, xreg)
@ -5884,6 +5994,9 @@ avr_out_plus_1 (rtx *xop, int *plen, enum rtx_code code, int *pcc)
  /* MODE of the operation.  */
  enum machine_mode mode = GET_MODE (xop[0]);

+  /* INT_MODE of the same size.  */
+  enum machine_mode imode = int_mode_for_mode (mode);
+
  /* Number of bytes to operate on.  */
  int i, n_bytes = GET_MODE_SIZE (mode);

@ -5908,8 +6021,11 @@ avr_out_plus_1 (rtx *xop, int *plen, enum rtx_code code, int *pcc)
  
  *pcc = (MINUS == code) ? CC_SET_CZN : CC_CLOBBER;

+  if (CONST_FIXED_P (xval))
+    xval = avr_to_int_mode (xval);
+
  if (MINUS == code)
-    xval = simplify_unary_operation (NEG, mode, xval, mode);
+    xval = simplify_unary_operation (NEG, imode, xval, imode);

  op[2] = xop[3];

@ -5920,7 +6036,7 @@ avr_out_plus_1 (rtx *xop, int *plen, enum rtx_code code, int *pcc)
    {
      /* We operate byte-wise on the destination.  */
      rtx reg8 = simplify_gen_subreg (QImode, xop[0], mode, i);
-      rtx xval8 = simplify_gen_subreg (QImode, xval, mode, i);
+      rtx xval8 = simplify_gen_subreg (QImode, xval, imode, i);

      /* 8-bit value to operate with this byte. */
      unsigned int val8 = UINTVAL (xval8) & GET_MODE_MASK (QImode);
@ -5941,7 +6057,7 @@ avr_out_plus_1 (rtx *xop, int *plen, enum rtx_code code, int *pcc)
          && i + 2 <= n_bytes
          && test_hard_reg_class (ADDW_REGS, reg8))
        {
-          rtx xval16 = simplify_gen_subreg (HImode, xval, mode, i);
+          rtx xval16 = simplify_gen_subreg (HImode, xval, imode, i);
          unsigned int val16 = UINTVAL (xval16) & GET_MODE_MASK (HImode);

          /* Registers R24, X, Y, Z can use ADIW/SBIW with constants < 64
@ -6085,6 +6201,41 @@ avr_out_plus_noclobber (rtx *xop, int *plen, int *pcc)
 }


+/* Output subtraction of register XOP[0] and compile time constant XOP[2]:
+
+      XOP[0] = XOP[0] - XOP[2]
+
+   This is basically the same as `avr_out_plus' except that we subtract.
+   It's needed because (minus x const) is not mapped to (plus x -const)
+   for the fixed point modes.  */
+
+const char*
+avr_out_minus (rtx *xop, int *plen, int *pcc)
+{
+  rtx op[4];
+
+  if (pcc)
+    *pcc = (int) CC_SET_CZN;
+
+  if (REG_P (xop[2]))
+    return avr_asm_len ("sub %A0,%A2" CR_TAB
+                        "sbc %B0,%B2", xop, plen, -2);
+
+  if (!CONST_INT_P (xop[2])
+      && !CONST_FIXED_P (xop[2]))
+    return avr_asm_len ("subi %A0,lo8(%2)" CR_TAB
+                        "sbci %B0,hi8(%2)", xop, plen, -2);
+  
+  op[0] = avr_to_int_mode (xop[0]);
+  op[1] = avr_to_int_mode (xop[1]);
+  op[2] = gen_int_mode (-INTVAL (avr_to_int_mode (xop[2])),
+                        GET_MODE (op[0]));
+  op[3] = xop[3];
+
+  return avr_out_plus (op, plen, pcc);
+}
+
+
 /* Prepare operands of adddi3_const_insn to be used with avr_out_plus_1.  */

 const char*
@ -6103,6 +6254,19 @@ avr_out_plus64 (rtx addend, int *plen)
  return "";
 }

+
+/* Prepare operands of subdi3_const_insn to be used with avr_out_plus64.  */
+
+const char*
+avr_out_minus64 (rtx subtrahend, int *plen)
+{
+  rtx xneg = avr_to_int_mode (subtrahend);
+  xneg = simplify_unary_operation (NEG, DImode, xneg, DImode);
+
+  return avr_out_plus64 (xneg, plen);
+}
+
+
 /* Output bit operation (IOR, AND, XOR) with register XOP[0] and compile
   time constant XOP[2]:

@ -6442,6 +6606,349 @@ avr_rotate_bytes (rtx operands[])
    return true;
 }

+
+/* Outputs instructions needed for fixed point type conversion.
+   This includes converting between any fixed point type, as well
+   as converting to any integer type.  Conversion between integer
+   types is not supported.
+
+   The number of instructions generated depends on the types
+   being converted and the registers assigned to them.
+
+   The number of instructions required to complete the conversion
+   is least if the registers for source and destination are overlapping
+   and are aligned at the decimal place as actual movement of data is
+   completely avoided.  In some cases, the conversion may already be
+   complete without any instructions needed.
+
+   When converting to signed types from signed types, sign extension
+   is implemented.
+
+   Converting signed fractional types requires a bit shift if converting
+   to or from any unsigned fractional type because the decimal place is
+   shifted by 1 bit.  When the destination is a signed fractional, the sign
+   is stored in either the carry or T bit.  */
+
+const char*
+avr_out_fract (rtx insn, rtx operands[], bool intsigned, int *plen)
+{
+  int i;
+  bool sbit[2];
+  /* ilen: Length of integral part (in bytes)
+     flen: Length of fractional part (in bytes)
+     tlen: Length of operand (in bytes)
+     blen: Length of operand (in bits) */
+  int ilen[2], flen[2], tlen[2], blen[2];
+  int rdest, rsource, offset;
+  int start, end, dir;
+  bool sign_in_T = false, sign_in_Carry = false, sign_done = false;
+  bool widening_sign_extend = false;
+  int clrword = -1, lastclr = 0, clr = 0;
+  rtx xop[6];
+
+  const int dest = 0;
+  const int src = 1;
+
+  xop[dest] = operands[dest];
+  xop[src] = operands[src];
+
+  if (plen)
+    *plen = 0;
+
+  /* Determine format (integer and fractional parts)
+     of types needing conversion.  */
+
+  for (i = 0; i < 2; i++)
+    {
+      enum machine_mode mode = GET_MODE (xop[i]);
+
+      tlen[i] = GET_MODE_SIZE (mode);
+      blen[i] = GET_MODE_BITSIZE (mode);
+
+      if (SCALAR_INT_MODE_P (mode))
+        {
+          sbit[i] = intsigned;
+          ilen[i] = GET_MODE_SIZE (mode);
+          flen[i] = 0;
+        }
+      else if (ALL_SCALAR_FIXED_POINT_MODE_P (mode))
+        {
+          sbit[i] = SIGNED_SCALAR_FIXED_POINT_MODE_P (mode);
+          ilen[i] = (GET_MODE_IBIT (mode) + 1) / 8;
+          flen[i] = (GET_MODE_FBIT (mode) + 1) / 8;
+        }
+      else
+        fatal_insn ("unsupported fixed-point conversion", insn);
+    }
+
+  /* Perform sign extension if source and dest are both signed,
+     and there are more integer parts in dest than in source.  */
+
+  widening_sign_extend = sbit[dest] && sbit[src] && ilen[dest] > ilen[src];
+
+  rdest = REGNO (xop[dest]);
+  rsource = REGNO (xop[src]);
+  offset = flen[src] - flen[dest];
+
+  /* Position of MSB resp. sign bit.  */
+
+  xop[2] = GEN_INT (blen[dest] - 1);
+  xop[3] = GEN_INT (blen[src] - 1);
+
+  /* Store the sign bit if the destination is a signed fract and the source
+     has a sign in the integer part.  */
+
+  if (sbit[dest] && ilen[dest] == 0 && sbit[src] && ilen[src] > 0)
+    {
+      /* To avoid using BST and BLD if the source and destination registers
+         overlap or the source is unused after, we can use LSL to store the
+         sign bit in carry since we don't need the integral part of the source.
+         Restoring the sign from carry saves one BLD instruction below.  */
+
+      if (reg_unused_after (insn, xop[src])
+          || (rdest < rsource + tlen[src]
+              && rdest + tlen[dest] > rsource))
+        {
+          avr_asm_len ("lsl %T1%t3", xop, plen, 1);
+          sign_in_Carry = true;
+        }
+      else
+        {
+          avr_asm_len ("bst %T1%T3", xop, plen, 1);
+          sign_in_T = true;
+        }
+    }
+
+  /* Pick the correct direction to shift bytes.  */
+
+  if (rdest < rsource + offset)
+    {
+      dir = 1;
+      start = 0;
+      end = tlen[dest];
+    }
+  else
+    {
+      dir = -1;
+      start = tlen[dest] - 1;
+      end = -1;
+    }
+
+  /* Perform conversion by moving registers into place, clearing
+     destination registers that do not overlap with any source.  */
+
+  for (i = start; i != end; i += dir)
+    {
+      int destloc = rdest + i;
+      int sourceloc = rsource + i + offset;
+
+      /* Source register location is outside range of source register,
+         so clear this byte in the dest.  */
+
+      if (sourceloc < rsource
+          || sourceloc >= rsource + tlen[src])
+        {
+          if (AVR_HAVE_MOVW
+              && i + dir != end
+              && (sourceloc + dir < rsource
+                  || sourceloc + dir >= rsource + tlen[src])
+              && ((dir == 1 && !(destloc % 2) && !(sourceloc % 2))
+                  || (dir == -1 && (destloc % 2) && (sourceloc % 2)))
+              && clrword != -1)
+            {
+              /* Use already cleared word to clear two bytes at a time.  */
+
+              int even_i = i & ~1;
+              int even_clrword = clrword & ~1;
+
+              xop[4] = GEN_INT (8 * even_i);
+              xop[5] = GEN_INT (8 * even_clrword);
+              avr_asm_len ("movw %T0%t4,%T0%t5", xop, plen, 1);
+              i += dir;
+            }
+          else
+            {
+              if (i == tlen[dest] - 1
+                  && widening_sign_extend
+                  && blen[src] - 1 - 8 * offset < 0)
+                {
+                  /* The SBRC below that sign-extends would come
+                     up with a negative bit number because the sign
+                     bit is out of reach.  ALso avoid some early-clobber
+                     situations because of premature CLR.  */
+
+                  if (reg_unused_after (insn, xop[src]))
+                    avr_asm_len ("lsl %T1%t3" CR_TAB
+                                 "sbc %T0%t2,%T0%t2", xop, plen, 2);
+                  else
+                    avr_asm_len ("mov __tmp_reg__,%T1%t3"  CR_TAB
+                                 "lsl __tmp_reg__"         CR_TAB
+                                 "sbc %T0%t2,%T0%t2", xop, plen, 3);
+                  sign_done = true;
+
+                  continue;
+                }
+              
+              /* Do not clear the register if it is going to get
+                 sign extended with a MOV later.  */
+
+              if (sbit[dest] && sbit[src]
+                  && i != tlen[dest] - 1
+                  && i >= flen[dest])
+                {
+                  continue;
+                }
+
+              xop[4] = GEN_INT (8 * i);
+              avr_asm_len ("clr %T0%t4", xop, plen, 1);
+
+              /* If the last byte was cleared too, we have a cleared
+                 word we can MOVW to clear two bytes at a time.  */
+
+              if (lastclr) 
+                clrword = i;
+
+              clr = 1;
+            }
+        }
+      else if (destloc == sourceloc)
+        {
+          /* Source byte is already in destination:  Nothing needed.  */
+
+          continue;
+        }
+      else
+        {
+          /* Registers do not line up and source register location
+             is within range:  Perform move, shifting with MOV or MOVW.  */
+
+          if (AVR_HAVE_MOVW
+              && i + dir != end
+              && sourceloc + dir >= rsource
+              && sourceloc + dir < rsource + tlen[src]
+              && ((dir == 1 && !(destloc % 2) && !(sourceloc % 2))
+                  || (dir == -1 && (destloc % 2) && (sourceloc % 2))))
+            {
+              int even_i = i & ~1;
+              int even_i_plus_offset = (i + offset) & ~1;
+
+              xop[4] = GEN_INT (8 * even_i);
+              xop[5] = GEN_INT (8 * even_i_plus_offset);
+              avr_asm_len ("movw %T0%t4,%T1%t5", xop, plen, 1);
+              i += dir;
+            }
+          else
+            {
+              xop[4] = GEN_INT (8 * i);
+              xop[5] = GEN_INT (8 * (i + offset));
+              avr_asm_len ("mov %T0%t4,%T1%t5", xop, plen, 1);
+            }
+        }
+
+      lastclr = clr;
+      clr = 0;
+    }
+      
+  /* Perform sign extension if source and dest are both signed,
+     and there are more integer parts in dest than in source.  */
+
+  if (widening_sign_extend)
+    {
+      if (!sign_done)
+        {
+          xop[4] = GEN_INT (blen[src] - 1 - 8 * offset);
+
+          /* Register was cleared above, so can become 0xff and extended.
+             Note:  Instead of the CLR/SBRC/COM the sign extension could
+             be performed after the LSL below by means of a SBC if only
+             one byte has to be shifted left.  */
+
+          avr_asm_len ("sbrc %T0%T4" CR_TAB
+                       "com %T0%t2", xop, plen, 2);
+        }
+
+      /* Sign extend additional bytes by MOV and MOVW.  */
+
+      start = tlen[dest] - 2;
+      end = flen[dest] + ilen[src] - 1;
+
+      for (i = start; i != end; i--)
+        {
+          if (AVR_HAVE_MOVW && i != start && i-1 != end)
+            {
+              i--;
+              xop[4] = GEN_INT (8 * i);
+              xop[5] = GEN_INT (8 * (tlen[dest] - 2));
+              avr_asm_len ("movw %T0%t4,%T0%t5", xop, plen, 1);
+            }
+          else
+            {
+              xop[4] = GEN_INT (8 * i);
+              xop[5] = GEN_INT (8 * (tlen[dest] - 1));
+              avr_asm_len ("mov %T0%t4,%T0%t5", xop, plen, 1);
+            }
+        }
+    }
+
+  /* If destination is a signed fract, and the source was not, a shift
+     by 1 bit is needed.  Also restore sign from carry or T.  */
+
+  if (sbit[dest] && !ilen[dest] && (!sbit[src] || ilen[src]))
+    {
+      /* We have flen[src] non-zero fractional bytes to shift.
+         Because of the right shift, handle one byte more so that the
+         LSB won't be lost.  */
+
+      int nonzero = flen[src] + 1;
+
+      /* If the LSB is in the T flag and there are no fractional
+         bits, the high byte is zero and no shift needed.  */
+      
+      if (flen[src] == 0 && sign_in_T)
+        nonzero = 0;
+
+      start = flen[dest] - 1;
+      end = start - nonzero;
+
+      for (i = start; i > end && i >= 0; i--)
+        {
+          xop[4] = GEN_INT (8 * i);
+          if (i == start && !sign_in_Carry)
+            avr_asm_len ("lsr %T0%t4", xop, plen, 1);
+          else
+            avr_asm_len ("ror %T0%t4", xop, plen, 1);
+        }
+
+      if (sign_in_T)
+        {
+          avr_asm_len ("bld %T0%T2", xop, plen, 1);
+        }
+    }
+  else if (sbit[src] && !ilen[src] && (!sbit[dest] || ilen[dest]))
+    {
+      /* If source was a signed fract and dest was not, shift 1 bit
+         other way.  */
+
+      start = flen[dest] - flen[src];
+
+      if (start < 0)
+        start = 0;
+
+      for (i = start; i < flen[dest]; i++)
+        {
+          xop[4] = GEN_INT (8 * i);
+
+          if (i == start)
+            avr_asm_len ("lsl %T0%t4", xop, plen, 1);
+          else
+            avr_asm_len ("rol %T0%t4", xop, plen, 1);
+        }
+    }
+
+  return "";
+}
+
+
 /* Modifies the length assigned to instruction INSN
   LEN is the initially computed length of the insn.  */

@ -6489,6 +6996,8 @@ adjust_insn_length (rtx insn, int len)
      
    case ADJUST_LEN_OUT_PLUS: avr_out_plus (op, &len, NULL); break;
    case ADJUST_LEN_PLUS64: avr_out_plus64 (op[0], &len); break;
+    case ADJUST_LEN_MINUS: avr_out_minus (op, &len, NULL); break;
+    case ADJUST_LEN_MINUS64: avr_out_minus64 (op[0], &len); break;
    case ADJUST_LEN_OUT_PLUS_NOCLOBBER:
      avr_out_plus_noclobber (op, &len, NULL); break;

@ -6502,6 +7011,9 @@ adjust_insn_length (rtx insn, int len)
    case ADJUST_LEN_XLOAD: avr_out_xload (insn, op, &len); break;
    case ADJUST_LEN_LOAD_LPM: avr_load_lpm (insn, op, &len); break;

+    case ADJUST_LEN_SFRACT: avr_out_fract (insn, op, true, &len); break;
+    case ADJUST_LEN_UFRACT: avr_out_fract (insn, op, false, &len); break;
+
    case ADJUST_LEN_TSTHI: avr_out_tsthi (insn, op, &len); break;
    case ADJUST_LEN_TSTPSI: avr_out_tstpsi (insn, op, &len); break;
    case ADJUST_LEN_TSTSI: avr_out_tstsi (insn, op, &len); break;
@ -6683,6 +7195,20 @@ avr_assemble_integer (rtx x, unsigned int size, int aligned_p)
      
      return true;
    }
+  else if (CONST_FIXED_P (x))
+    {
+      unsigned n;
+
+      /* varasm fails to handle big fixed modes that don't fit in hwi.  */
+
+      for (n = 0; n < size; n++)
+        {
+          rtx xn = simplify_gen_subreg (QImode, x, GET_MODE (x), n);
+          default_assemble_integer (xn, 1, aligned_p);
+        }
+
+      return true;
+    }
  
  return default_assemble_integer (x, size, aligned_p);
 }
@ -7489,6 +8015,7 @@ avr_operand_rtx_cost (rtx x, enum machine_mode mode, enum rtx_code outer,
      return 0;

    case CONST_INT:
+    case CONST_FIXED:
    case CONST_DOUBLE:
      return COSTS_N_INSNS (GET_MODE_SIZE (mode));

@ -7518,6 +8045,7 @@ avr_rtx_costs_1 (rtx x, int codearg, int outer_code ATTRIBUTE_UNUSED,
  switch (code)
    {
    case CONST_INT:
+    case CONST_FIXED:
    case CONST_DOUBLE:
    case SYMBOL_REF:
    case CONST:
@ -8446,11 +8974,17 @@ avr_compare_pattern (rtx insn)
  if (pattern
      && NONJUMP_INSN_P (insn)
      && SET_DEST (pattern) == cc0_rtx
-      && GET_CODE (SET_SRC (pattern)) == COMPARE
-      && DImode != GET_MODE (XEXP (SET_SRC (pattern), 0))
-      && DImode != GET_MODE (XEXP (SET_SRC (pattern), 1)))
+      && GET_CODE (SET_SRC (pattern)) == COMPARE)
    {
-      return pattern;
+      enum machine_mode mode0 = GET_MODE (XEXP (SET_SRC (pattern), 0));
+      enum machine_mode mode1 = GET_MODE (XEXP (SET_SRC (pattern), 1));
+
+      /* The 64-bit comparisons have fixed operands ACC_A and ACC_B.
+         They must not be swapped, thus skip them.  */
+
+      if ((mode0 == VOIDmode || GET_MODE_SIZE (mode0) <= 4)
+          && (mode1 == VOIDmode || GET_MODE_SIZE (mode1) <= 4))
+        return pattern;
    }

  return NULL_RTX;
@ -8788,6 +9322,8 @@ avr_2word_insn_p (rtx insn)
      return false;
      
    case CODE_FOR_movqi_insn:
+    case CODE_FOR_movuqq_insn:
+    case CODE_FOR_movqq_insn:
      {
        rtx set  = single_set (insn);
        rtx src  = SET_SRC (set);
@ -8796,7 +9332,7 @@ avr_2word_insn_p (rtx insn)
        /* Factor out LDS and STS from movqi_insn.  */
        
        if (MEM_P (dest)
-            && (REG_P (src) || src == const0_rtx))
+            && (REG_P (src) || src == CONST0_RTX (GET_MODE (dest))))
          {
            return CONSTANT_ADDRESS_P (XEXP (dest, 0));
          }
@ -9021,7 +9557,7 @@ output_reload_in_const (rtx *op, rtx clobber_reg, int *len, bool clear_p)
  
  if (NULL_RTX == clobber_reg
      && !test_hard_reg_class (LD_REGS, dest)
-      && (! (CONST_INT_P (src) || CONST_DOUBLE_P (src))
+      && (! (CONST_INT_P (src) || CONST_FIXED_P (src) || CONST_DOUBLE_P (src))
          || !avr_popcount_each_byte (src, n_bytes,
                                      (1 << 0) | (1 << 1) | (1 << 8))))
    {
@ -9048,6 +9584,7 @@ output_reload_in_const (rtx *op, rtx clobber_reg, int *len, bool clear_p)
      ldreg_p = test_hard_reg_class (LD_REGS, xdest[n]);

      if (!CONST_INT_P (src)
+          && !CONST_FIXED_P (src)
          && !CONST_DOUBLE_P (src))
        {
          static const char* const asm_code[][2] =
@ -9239,6 +9776,7 @@ output_reload_insisf (rtx *op, rtx clobber_reg, int *len)
  if (AVR_HAVE_MOVW
      && !test_hard_reg_class (LD_REGS, op[0])
      && (CONST_INT_P (op[1])
+          || CONST_FIXED_P (op[1])
          || CONST_DOUBLE_P (op[1])))
    {
      int len_clr, len_noclr;
@ -10834,6 +11372,12 @@ avr_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *arg,
 #undef  TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P avr_scalar_mode_supported_p

+#undef  TARGET_BUILD_BUILTIN_VA_LIST
+#define TARGET_BUILD_BUILTIN_VA_LIST avr_build_builtin_va_list
+
+#undef  TARGET_FIXED_POINT_SUPPORTED_P
+#define TARGET_FIXED_POINT_SUPPORTED_P hook_bool_void_true
+
 #undef  TARGET_ADDR_SPACE_SUBSET_P
 #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p

--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@ -261,6 +261,7 @@ enum
 #define FLOAT_TYPE_SIZE 32
 #define DOUBLE_TYPE_SIZE 32
 #define LONG_DOUBLE_TYPE_SIZE 32
+#define LONG_LONG_ACCUM_TYPE_SIZE 64

 #define DEFAULT_SIGNED_CHAR 1

--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
--- a/gcc/config/avr/constraints.md
+++ b/gcc/config/avr/constraints.md
@ -192,3 +192,47 @@
  "32-bit integer constant where no nibble equals 0xf."
  (and (match_code "const_int")
       (match_test "!avr_has_nibble_0xf (op)")))
+
+;; CONST_FIXED is no element of 'n' so cook our own.
+;; "i" or "s" would match but because the insn uses iterators that cover
+;; INT_MODE, "i" or "s" is not always possible.
+
+(define_constraint "Ynn"
+  "Fixed-point constant known at compile time."
+  (match_code "const_fixed"))
+
+(define_constraint "Y00"
+  "Fixed-point or integer constant with bit representation 0x0"
+  (and (match_code "const_fixed,const_int")
+       (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_constraint "Y01"
+  "Fixed-point or integer constant with bit representation 0x1"
+  (ior (and (match_code "const_fixed")
+            (match_test "1 == INTVAL (avr_to_int_mode (op))"))
+       (match_test "satisfies_constraint_P (op)")))
+
+(define_constraint "Ym1"
+  "Fixed-point or integer constant with bit representation -0x1"
+  (ior (and (match_code "const_fixed")
+            (match_test "-1 == INTVAL (avr_to_int_mode (op))"))
+       (match_test "satisfies_constraint_N (op)")))
+
+(define_constraint "Y02"
+  "Fixed-point or integer constant with bit representation 0x2"
+  (ior (and (match_code "const_fixed")
+            (match_test "2 == INTVAL (avr_to_int_mode (op))"))
+       (match_test "satisfies_constraint_K (op)")))
+
+(define_constraint "Ym2"
+  "Fixed-point or integer constant with bit representation -0x2"
+  (ior (and (match_code "const_fixed")
+            (match_test "-2 == INTVAL (avr_to_int_mode (op))"))
+       (match_test "satisfies_constraint_Cm2 (op)")))
+
+;; Similar to "IJ" used with ADIW/SBIW, but for CONST_FIXED.
+
+(define_constraint "YIJ"
+  "Fixed-point constant from @minus{}0x003f to 0x003f."
+  (and (match_code "const_fixed")
+       (match_test "IN_RANGE (INTVAL (avr_to_int_mode (op)), -63, 63)")))
--- a/gcc/config/avr/predicates.md
+++ b/gcc/config/avr/predicates.md
@ -74,7 +74,7 @@

 ;; Return 1 if OP is the zero constant for MODE.
 (define_predicate "const0_operand"
-  (and (match_code "const_int,const_double")
+  (and (match_code "const_int,const_fixed,const_double")
       (match_test "op == CONST0_RTX (mode)")))

 ;; Return 1 if OP is the one constant integer for MODE.
@ -248,3 +248,21 @@
 (define_predicate "o16_operand"
  (and (match_code "const_int")
       (match_test "IN_RANGE (INTVAL (op), -(1<<16), -1)")))
+
+;; Const int, fixed, or double operand
+(define_predicate "const_operand"
+  (ior (match_code "const_fixed")
+       (match_code "const_double")
+       (match_operand 0 "const_int_operand")))
+
+;; Const int, const fixed, or const double operand
+(define_predicate "nonmemory_or_const_operand"
+  (ior (match_code "const_fixed")
+       (match_code "const_double")
+       (match_operand 0 "nonmemory_operand")))
+
+;; Immediate, const fixed, or const double operand
+(define_predicate "const_or_immediate_operand"
+  (ior (match_code "const_fixed")
+       (match_code "const_double")
+       (match_operand 0 "immediate_operand")))
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@ -1,3 +1,23 @@
+2012-08-24  Georg-Johann Lay  <avr@gjlay.de>
+
+	PR target/54222
+	* config/avr/lib1funcs-fixed.S: New file.
+	* config/avr/lib1funcs.S: Include it.  Undefine some divmodsi
+	after they are used.
+	(neg2, neg4): New macros.
+	(__mulqihi3,__umulqihi3,__mulhi3): Rewrite non-MUL variants.
+	(__mulhisi3,__umulhisi3,__mulsi3): Rewrite non-MUL variants.
+	(__umulhisi3): Speed up MUL variant if there is enough flash.
+	* config/avr/avr-lib.h (TA, UTA): Adjust according to gcc's
+	avr-modes.def.
+	* config/avr/t-avr (LIB1ASMFUNCS): Add: _fractqqsf, _fractuqqsf,
+	_fracthqsf, _fractuhqsf, _fracthasf, _fractuhasf, _fractsasf,
+	_fractusasf, _fractsfqq, _fractsfuqq, _fractsfhq, _fractsfuhq,
+	_fractsfha, _fractsfsa, _mulqq3, _muluqq3, _mulhq3, _muluhq3,
+	_mulha3, _muluha3, _mulsa3, _mulusa3, _divqq3, _udivuqq3, _divhq3,
+	_udivuhq3, _divha3, _udivuha3, _divsa3, _udivusa3.
+	(LIB2FUNCS_EXCLUDE): Add supported functions.
+
 2012-08-22  Georg-Johann Lay  <avr@gjlay.de>

 	* Makefile.in (fixed-funcs,fixed-conv-funcs): filter-out
--- a/libgcc/config/avr/avr-lib.h
+++ b/libgcc/config/avr/avr-lib.h
@ -4,3 +4,79 @@
 #define DI SI
 typedef int QItype __attribute__ ((mode (QI)));
 #endif
+
+/* fixed-bit.h does not define functions for TA and UTA because
+   that part is wrapped in #if MIN_UNITS_PER_WORD > 4.
+   This would lead to empty functions for TA and UTA.
+   Thus, supply appropriate defines as if HAVE_[U]TA == 1.
+   #define HAVE_[U]TA 1 won't work because avr-modes.def
+   uses ADJUST_BYTESIZE(TA,8) and fixed-bit.h is not generic enough
+   to arrange for such changes of the mode size.  */
+
+typedef unsigned _Fract UTAtype __attribute__ ((mode (UTA)));
+
+#if defined (UTA_MODE)
+#define FIXED_SIZE      8       /* in bytes */
+#define INT_C_TYPE      UDItype
+#define UINT_C_TYPE     UDItype
+#define HINT_C_TYPE     USItype
+#define HUINT_C_TYPE    USItype
+#define MODE_NAME       UTA
+#define MODE_NAME_S     uta
+#define MODE_UNSIGNED   1
+#endif
+
+#if defined (FROM_UTA)
+#define FROM_TYPE               4       /* ID for fixed-point */
+#define FROM_MODE_NAME          UTA
+#define FROM_MODE_NAME_S        uta
+#define FROM_INT_C_TYPE         UDItype
+#define FROM_SINT_C_TYPE        DItype
+#define FROM_UINT_C_TYPE        UDItype
+#define FROM_MODE_UNSIGNED      1
+#define FROM_FIXED_SIZE         8       /* in bytes */
+#elif defined (TO_UTA)
+#define TO_TYPE                 4       /* ID for fixed-point */
+#define TO_MODE_NAME            UTA
+#define TO_MODE_NAME_S          uta
+#define TO_INT_C_TYPE           UDItype
+#define TO_SINT_C_TYPE          DItype
+#define TO_UINT_C_TYPE          UDItype
+#define TO_MODE_UNSIGNED        1
+#define TO_FIXED_SIZE           8       /* in bytes */
+#endif
+
+/* Same for TAmode */
+
+typedef _Fract TAtype  __attribute__ ((mode (TA)));
+
+#if defined (TA_MODE)
+#define FIXED_SIZE      8       /* in bytes */
+#define INT_C_TYPE      DItype
+#define UINT_C_TYPE     UDItype
+#define HINT_C_TYPE     SItype
+#define HUINT_C_TYPE    USItype
+#define MODE_NAME       TA
+#define MODE_NAME_S     ta
+#define MODE_UNSIGNED   0
+#endif
+
+#if defined (FROM_TA)
+#define FROM_TYPE               4       /* ID for fixed-point */
+#define FROM_MODE_NAME          TA
+#define FROM_MODE_NAME_S        ta
+#define FROM_INT_C_TYPE         DItype
+#define FROM_SINT_C_TYPE        DItype
+#define FROM_UINT_C_TYPE        UDItype
+#define FROM_MODE_UNSIGNED      0
+#define FROM_FIXED_SIZE         8       /* in bytes */
+#elif defined (TO_TA)
+#define TO_TYPE                 4       /* ID for fixed-point */
+#define TO_MODE_NAME            TA
+#define TO_MODE_NAME_S          ta
+#define TO_INT_C_TYPE           DItype
+#define TO_SINT_C_TYPE          DItype
+#define TO_UINT_C_TYPE          UDItype
+#define TO_MODE_UNSIGNED        0
+#define TO_FIXED_SIZE           8       /* in bytes */
+#endif
--- a/libgcc/config/avr/lib1funcs-fixed.S
+++ b/libgcc/config/avr/lib1funcs-fixed.S
@ -0,0 +1,874 @@
+/*  -*- Mode: Asm -*-  */
+;;    Copyright (C) 2012
+;;    Free Software Foundation, Inc.
+;;    Contributed by Sean D'Epagnier  (sean@depagnier.com)
+;;                   Georg-Johann Lay (avr@gjlay.de)
+
+;; This file is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by the
+;; Free Software Foundation; either version 3, or (at your option) any
+;; later version.
+
+;; In addition to the permissions in the GNU General Public License, the
+;; Free Software Foundation gives you unlimited permission to link the
+;; compiled version of this file into combinations with other programs,
+;; and to distribute those combinations without any restriction coming
+;; from the use of this file.  (The General Public License restrictions
+;; do apply in other respects; for example, they cover modification of
+;; the file, and distribution when not linked into a combine
+;; executable.)
+
+;; This file is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with this program; see the file COPYING.  If not, write to
+;; the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+;; Boston, MA 02110-1301, USA.
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Fixed point library routines for AVR
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+.section .text.libgcc.fixed, "ax", @progbits
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Conversions to float
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+#if defined (L_fractqqsf)
+DEFUN __fractqqsf
+    ;; Move in place for SA -> SF conversion
+    clr     r22
+    mov     r23, r24
+    lsl     r23
+    ;; Sign-extend
+    sbc     r24, r24
+    mov     r25, r24
+    XJMP    __fractsasf
+ENDF __fractqqsf
+#endif  /* L_fractqqsf */
+
+#if defined (L_fractuqqsf)
+DEFUN __fractuqqsf
+    ;; Move in place for USA -> SF conversion
+    clr     r22
+    mov     r23, r24
+    ;; Zero-extend
+    clr     r24
+    clr     r25
+    XJMP    __fractusasf
+ENDF __fractuqqsf
+#endif  /* L_fractuqqsf */
+
+#if defined (L_fracthqsf)
+DEFUN __fracthqsf
+    ;; Move in place for SA -> SF conversion
+    wmov    22, 24
+    lsl     r22
+    rol     r23
+    ;; Sign-extend
+    sbc     r24, r24
+    mov     r25, r24
+    XJMP    __fractsasf
+ENDF __fracthqsf
+#endif  /* L_fracthqsf */
+
+#if defined (L_fractuhqsf)
+DEFUN __fractuhqsf
+    ;; Move in place for USA -> SF conversion
+    wmov    22, 24
+    ;; Zero-extend
+    clr     r24
+    clr     r25
+    XJMP    __fractusasf
+ENDF __fractuhqsf
+#endif  /* L_fractuhqsf */
+
+#if defined (L_fracthasf)
+DEFUN __fracthasf
+    ;; Move in place for SA -> SF conversion
+    clr     r22
+    mov     r23, r24
+    mov     r24, r25
+    ;; Sign-extend
+    lsl     r25
+    sbc     r25, r25
+    XJMP    __fractsasf
+ENDF __fracthasf
+#endif  /* L_fracthasf */
+
+#if defined (L_fractuhasf)
+DEFUN __fractuhasf
+    ;; Move in place for USA -> SF conversion
+    clr     r22
+    mov     r23, r24
+    mov     r24, r25
+    ;; Zero-extend
+    clr     r25
+    XJMP    __fractusasf
+ENDF __fractuhasf
+#endif  /* L_fractuhasf */
+
+
+#if defined (L_fractsqsf)
+DEFUN __fractsqsf
+    XCALL   __floatsisf
+    ;; Divide non-zero results by 2^31 to move the
+    ;; decimal point into place
+    tst     r25
+    breq    0f
+    subi    r24, exp_lo (31)
+    sbci    r25, exp_hi (31)
+0:  ret
+ENDF __fractsqsf
+#endif  /* L_fractsqsf */
+
+#if defined (L_fractusqsf)
+DEFUN __fractusqsf
+    XCALL   __floatunsisf
+    ;; Divide non-zero results by 2^32 to move the
+    ;; decimal point into place
+    cpse    r25, __zero_reg__
+    subi    r25, exp_hi (32)
+    ret
+ENDF __fractusqsf
+#endif  /* L_fractusqsf */
+
+#if defined (L_fractsasf)
+DEFUN __fractsasf
+    XCALL   __floatsisf
+    ;; Divide non-zero results by 2^16 to move the
+    ;; decimal point into place
+    cpse    r25, __zero_reg__
+    subi    r25, exp_hi (16)
+    ret
+ENDF __fractsasf
+#endif  /* L_fractsasf */
+
+#if defined (L_fractusasf)
+DEFUN __fractusasf
+    XCALL   __floatunsisf
+    ;; Divide non-zero results by 2^16 to move the
+    ;; decimal point into place
+    cpse    r25, __zero_reg__
+    subi    r25, exp_hi (16)
+    ret
+ENDF __fractusasf
+#endif  /* L_fractusasf */
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Conversions from float
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+       
+#if defined (L_fractsfqq)
+DEFUN __fractsfqq
+    ;; Multiply with 2^{24+7} to get a QQ result in r25
+    subi    r24, exp_lo (-31)
+    sbci    r25, exp_hi (-31)
+    XCALL   __fixsfsi
+    mov     r24, r25
+    ret
+ENDF __fractsfqq
+#endif  /* L_fractsfqq */
+
+#if defined (L_fractsfuqq)
+DEFUN __fractsfuqq
+    ;; Multiply with 2^{24+8} to get a UQQ result in r25
+    subi    r25, exp_hi (-32)
+    XCALL   __fixunssfsi
+    mov     r24, r25
+    ret
+ENDF __fractsfuqq
+#endif  /* L_fractsfuqq */
+
+#if defined (L_fractsfha)
+DEFUN __fractsfha
+    ;; Multiply with 2^24 to get a HA result in r25:r24
+    subi    r25, exp_hi (-24)
+    XJMP    __fixsfsi
+ENDF __fractsfha
+#endif  /* L_fractsfha */
+
+#if defined (L_fractsfuha)
+DEFUN __fractsfuha
+    ;; Multiply with 2^24 to get a UHA result in r25:r24
+    subi    r25, exp_hi (-24)
+    XJMP    __fixunssfsi
+ENDF __fractsfuha
+#endif  /* L_fractsfuha */
+
+#if defined (L_fractsfhq)
+DEFUN __fractsfsq
+ENDF  __fractsfsq
+
+DEFUN __fractsfhq
+    ;; Multiply with 2^{16+15} to get a HQ result in r25:r24
+    ;; resp. with 2^31 to get a SQ result in r25:r22
+    subi    r24, exp_lo (-31)
+    sbci    r25, exp_hi (-31)
+    XJMP    __fixsfsi
+ENDF __fractsfhq
+#endif  /* L_fractsfhq */
+
+#if defined (L_fractsfuhq)
+DEFUN __fractsfusq
+ENDF  __fractsfusq
+
+DEFUN __fractsfuhq
+    ;; Multiply with 2^{16+16} to get a UHQ result in r25:r24
+    ;; resp. with 2^32 to get a USQ result in r25:r22
+    subi    r25, exp_hi (-32)
+    XJMP    __fixunssfsi
+ENDF __fractsfuhq
+#endif  /* L_fractsfuhq */
+
+#if defined (L_fractsfsa)
+DEFUN __fractsfsa
+    ;; Multiply with 2^16 to get a SA result in r25:r22
+    subi    r25, exp_hi (-16)
+    XJMP    __fixsfsi
+ENDF __fractsfsa
+#endif  /* L_fractsfsa */
+
+#if defined (L_fractsfusa)
+DEFUN __fractsfusa
+    ;; Multiply with 2^16 to get a USA result in r25:r22
+    subi    r25, exp_hi (-16)
+    XJMP    __fixunssfsi
+ENDF __fractsfusa
+#endif  /* L_fractsfusa */
+
+
+;; For multiplication the functions here are called directly from
+;; avr-fixed.md instead of using the standard libcall mechanisms.
+;; This can make better code because GCC knows exactly which
+;; of the call-used registers (not all of them) are clobbered.  */
+
+/*******************************************************
+    Fractional  Multiplication  8 x 8  without MUL
+*******************************************************/
+
+#if defined (L_mulqq3) && !defined (__AVR_HAVE_MUL__)
+;;; R23 = R24 * R25
+;;; Clobbers: __tmp_reg__, R22, R24, R25
+;;; Rounding: ???
+DEFUN __mulqq3
+    XCALL   __fmuls
+    ;; TR 18037 requires that  (-1) * (-1)  does not overflow
+    ;; The only input that can produce  -1  is  (-1)^2.
+    dec     r23
+    brvs    0f
+    inc     r23
+0:  ret
+ENDF  __mulqq3
+#endif /* L_mulqq3 && ! HAVE_MUL */
+
+/*******************************************************
+    Fractional Multiply  .16 x .16  with and without MUL
+*******************************************************/
+
+#if defined (L_mulhq3)
+;;; Same code with and without MUL, but the interfaces differ:
+;;; no MUL: (R25:R24) = (R22:R23) * (R24:R25)
+;;;         Clobbers: ABI, called by optabs
+;;; MUL:    (R25:R24) = (R19:R18) * (R27:R26)
+;;;         Clobbers: __tmp_reg__, R22, R23
+;;; Rounding:  -0.5 LSB  <= error  <=  0.5 LSB
+DEFUN   __mulhq3
+    XCALL   __mulhisi3
+    ;; Shift result into place
+    lsl     r23
+    rol     r24
+    rol     r25
+    brvs    1f
+    ;; Round
+    sbrc    r23, 7
+    adiw    r24, 1
+    ret
+1:  ;; Overflow.  TR 18037 requires  (-1)^2  not to overflow
+    ldi     r24, lo8 (0x7fff)
+    ldi     r25, hi8 (0x7fff)
+    ret
+ENDF __mulhq3
+#endif  /* defined (L_mulhq3) */
+
+#if defined (L_muluhq3)
+;;; Same code with and without MUL, but the interfaces differ:
+;;; no MUL: (R25:R24) *= (R23:R22)
+;;;         Clobbers: ABI, called by optabs
+;;; MUL:    (R25:R24) = (R19:R18) * (R27:R26)
+;;;         Clobbers: __tmp_reg__, R22, R23
+;;; Rounding:  -0.5 LSB  <  error  <=  0.5 LSB
+DEFUN   __muluhq3
+    XCALL   __umulhisi3
+    ;; Round
+    sbrc    r23, 7
+    adiw    r24, 1
+    ret
+ENDF __muluhq3
+#endif  /* L_muluhq3 */
+
+
+/*******************************************************
+    Fixed  Multiply  8.8 x 8.8  with and without MUL
+*******************************************************/
+
+#if defined (L_mulha3)
+;;; Same code with and without MUL, but the interfaces differ:
+;;; no MUL: (R25:R24) = (R22:R23) * (R24:R25)
+;;;         Clobbers: ABI, called by optabs
+;;; MUL:    (R25:R24) = (R19:R18) * (R27:R26)
+;;;         Clobbers: __tmp_reg__, R22, R23
+;;; Rounding:  -0.5 LSB  <=  error  <=  0.5 LSB
+DEFUN   __mulha3
+    XCALL   __mulhisi3
+    XJMP    __muluha3_round
+ENDF __mulha3
+#endif  /* L_mulha3 */
+
+#if defined (L_muluha3)
+;;; Same code with and without MUL, but the interfaces differ:
+;;; no MUL: (R25:R24) *= (R23:R22)
+;;;         Clobbers: ABI, called by optabs
+;;; MUL:    (R25:R24) = (R19:R18) * (R27:R26)
+;;;         Clobbers: __tmp_reg__, R22, R23
+;;; Rounding:  -0.5 LSB  <  error  <=  0.5 LSB
+DEFUN   __muluha3
+    XCALL   __umulhisi3
+    XJMP    __muluha3_round
+ENDF __muluha3
+#endif  /* L_muluha3 */
+
+#if defined (L_muluha3_round)
+DEFUN   __muluha3_round
+    ;; Shift result into place
+    mov     r25, r24
+    mov     r24, r23
+    ;; Round
+    sbrc    r22, 7
+    adiw    r24, 1
+    ret
+ENDF __muluha3_round
+#endif  /* L_muluha3_round */
+
+
+/*******************************************************
+    Fixed  Multiplication  16.16 x 16.16
+*******************************************************/
+
+#if defined (__AVR_HAVE_MUL__)
+
+;; Multiplier
+#define A0  16
+#define A1  A0+1
+#define A2  A1+1
+#define A3  A2+1
+
+;; Multiplicand
+#define B0  20
+#define B1  B0+1
+#define B2  B1+1
+#define B3  B2+1
+
+;; Result
+#define C0  24
+#define C1  C0+1
+#define C2  C1+1
+#define C3  C2+1
+
+#if defined (L_mulusa3)
+;;; (C3:C0) = (A3:A0) * (B3:B0)
+;;; Clobbers: __tmp_reg__
+;;; Rounding:  -0.5 LSB  <  error  <=  0.5 LSB
+DEFUN   __mulusa3
+    ;; Some of the MUL instructions have LSBs outside the result.
+    ;; Don't ignore these LSBs in order to tame rounding error.
+    ;; Use C2/C3 for these LSBs.
+
+    clr C0
+    clr C1
+    mul A0, B0  $  movw C2, r0
+
+    mul A1, B0  $  add  C3, r0  $  adc C0, r1
+    mul A0, B1  $  add  C3, r0  $  adc C0, r1  $  rol C1
+    
+    ;; Round
+    sbrc C3, 7
+    adiw C0, 1
+    
+    ;; The following MULs don't have LSBs outside the result.
+    ;; C2/C3 is the high part.
+
+    mul  A0, B2  $  add C0, r0  $  adc C1, r1  $  sbc  C2, C2
+    mul  A1, B1  $  add C0, r0  $  adc C1, r1  $  sbci C2, 0
+    mul  A2, B0  $  add C0, r0  $  adc C1, r1  $  sbci C2, 0
+    neg  C2
+
+    mul  A0, B3  $  add C1, r0  $  adc C2, r1  $  sbc  C3, C3
+    mul  A1, B2  $  add C1, r0  $  adc C2, r1  $  sbci C3, 0
+    mul  A2, B1  $  add C1, r0  $  adc C2, r1  $  sbci C3, 0
+    mul  A3, B0  $  add C1, r0  $  adc C2, r1  $  sbci C3, 0
+    neg  C3
+    
+    mul  A1, B3  $  add C2, r0  $  adc C3, r1
+    mul  A2, B2  $  add C2, r0  $  adc C3, r1
+    mul  A3, B1  $  add C2, r0  $  adc C3, r1
+    
+    mul  A2, B3  $  add C3, r0
+    mul  A3, B2  $  add C3, r0
+
+    clr  __zero_reg__
+    ret
+ENDF __mulusa3
+#endif /* L_mulusa3 */
+
+#if defined (L_mulsa3)
+;;; (C3:C0) = (A3:A0) * (B3:B0)
+;;; Clobbers: __tmp_reg__
+;;; Rounding:  -0.5 LSB  <=  error  <=  0.5 LSB
+DEFUN __mulsa3
+    XCALL   __mulusa3
+    tst     B3
+    brpl    1f
+    sub     C2, A0
+    sbc     C3, A1
+1:  sbrs    A3, 7
+    ret
+    sub     C2, B0
+    sbc     C3, B1
+    ret
+ENDF __mulsa3
+#endif /* L_mulsa3 */
+
+#undef A0
+#undef A1
+#undef A2
+#undef A3
+#undef B0
+#undef B1
+#undef B2
+#undef B3
+#undef C0
+#undef C1
+#undef C2
+#undef C3
+
+#else /* __AVR_HAVE_MUL__ */
+
+#define A0 18
+#define A1 A0+1
+#define A2 A0+2
+#define A3 A0+3
+
+#define B0 22
+#define B1 B0+1
+#define B2 B0+2
+#define B3 B0+3
+
+#define C0  22
+#define C1  C0+1
+#define C2  C0+2
+#define C3  C0+3
+
+;; __tmp_reg__
+#define CC0  0
+;; __zero_reg__
+#define CC1  1
+#define CC2  16
+#define CC3  17
+
+#define AA0  26
+#define AA1  AA0+1
+#define AA2  30
+#define AA3  AA2+1
+
+#if defined (L_mulsa3)
+;;; (R25:R22)  *=  (R21:R18)
+;;; Clobbers: ABI, called by optabs
+;;; Rounding:  -1 LSB  <=  error  <=  1 LSB
+DEFUN   __mulsa3
+    push    B0
+    push    B1
+    bst     B3, 7
+    XCALL   __mulusa3
+    ;; A survived in  31:30:27:26
+    rcall 1f
+    pop     AA1
+    pop     AA0
+    bst     AA3, 7
+1:  brtc  9f
+    ;; 1-extend A/B
+    sub     C2, AA0
+    sbc     C3, AA1
+9:  ret
+ENDF __mulsa3
+#endif  /* L_mulsa3 */
+
+#if defined (L_mulusa3)
+;;; (R25:R22)  *=  (R21:R18)
+;;; Clobbers: ABI, called by optabs and __mulsua
+;;; Rounding:  -1 LSB  <=  error  <=  1 LSB
+;;; Does not clobber T and A[] survives in 26, 27, 30, 31
+DEFUN   __mulusa3
+    push    CC2
+    push    CC3
+    ; clear result
+    clr     __tmp_reg__
+    wmov    CC2, CC0
+    ; save multiplicand
+    wmov    AA0, A0
+    wmov    AA2, A2
+    rjmp 3f
+
+    ;; Loop the integral part
+
+1:  ;; CC += A * 2^n;  n >= 0
+    add  CC0,A0  $  adc CC1,A1  $  adc  CC2,A2  $  adc  CC3,A3
+
+2:  ;; A <<= 1
+    lsl  A0      $  rol A1      $  rol  A2      $  rol  A3
+
+3:  ;; IBIT(B) >>= 1
+    ;; Carry = n-th bit of B;  n >= 0
+    lsr     B3
+    ror     B2
+    brcs 1b
+    sbci    B3, 0
+    brne 2b
+
+    ;; Loop the fractional part
+    ;; B2/B3 is 0 now, use as guard bits for rounding
+    ;; Restore multiplicand
+    wmov    A0, AA0
+    wmov    A2, AA2
+    rjmp 5f
+
+4:  ;; CC += A:Guard * 2^n;  n < 0
+    add  B3,B2 $  adc  CC0,A0  $  adc  CC1,A1  $  adc  CC2,A2  $  adc  CC3,A3
+5:
+    ;; A:Guard >>= 1
+    lsr  A3   $  ror  A2  $  ror  A1  $  ror   A0  $   ror  B2
+
+    ;; FBIT(B) <<= 1
+    ;; Carry = n-th bit of B;  n < 0
+    lsl     B0
+    rol     B1
+    brcs 4b
+    sbci    B0, 0
+    brne 5b
+
+    ;; Move result into place and round
+    lsl     B3
+    wmov    C2, CC2
+    wmov    C0, CC0
+    clr     __zero_reg__
+    adc     C0, __zero_reg__
+    adc     C1, __zero_reg__
+    adc     C2, __zero_reg__
+    adc     C3, __zero_reg__
+    
+    ;; Epilogue
+    pop     CC3
+    pop     CC2
+    ret
+ENDF __mulusa3
+#endif  /* L_mulusa3 */
+
+#undef A0
+#undef A1
+#undef A2
+#undef A3
+#undef B0
+#undef B1
+#undef B2
+#undef B3
+#undef C0
+#undef C1
+#undef C2
+#undef C3
+#undef AA0
+#undef AA1
+#undef AA2
+#undef AA3
+#undef CC0
+#undef CC1
+#undef CC2
+#undef CC3
+
+#endif /* __AVR_HAVE_MUL__ */
+
+/*******************************************************
+      Fractional Division 8 / 8
+*******************************************************/
+
+#define r_divd  r25     /* dividend */
+#define r_quo   r24     /* quotient */
+#define r_div   r22     /* divisor */
+
+#if defined (L_divqq3)
+DEFUN   __divqq3
+    mov     r0, r_divd
+    eor     r0, r_div
+    sbrc    r_div, 7
+    neg     r_div
+    sbrc    r_divd, 7
+    neg     r_divd
+    cp      r_divd, r_div
+    breq    __divqq3_minus1  ; if equal return -1
+    XCALL   __udivuqq3
+    lsr     r_quo
+    sbrc    r0, 7   ; negate result if needed
+    neg     r_quo
+    ret
+__divqq3_minus1:
+    ldi     r_quo, 0x80
+    ret
+ENDF __divqq3
+#endif  /* defined (L_divqq3) */
+
+#if defined (L_udivuqq3)
+DEFUN   __udivuqq3
+    clr     r_quo           ; clear quotient
+    inc     __zero_reg__    ; init loop counter, used per shift
+__udivuqq3_loop:
+    lsl     r_divd          ; shift dividend
+    brcs    0f              ; dividend overflow
+    cp      r_divd,r_div    ; compare dividend & divisor
+    brcc    0f              ; dividend >= divisor
+    rol     r_quo           ; shift quotient (with CARRY)
+    rjmp    __udivuqq3_cont
+0:
+    sub     r_divd,r_div    ; restore dividend
+    lsl     r_quo           ; shift quotient (without CARRY)
+__udivuqq3_cont:
+    lsl     __zero_reg__    ; shift loop-counter bit
+    brne    __udivuqq3_loop
+    com     r_quo           ; complement result
+                            ; because C flag was complemented in loop
+    ret
+ENDF __udivuqq3
+#endif  /* defined (L_udivuqq3) */
+
+#undef  r_divd
+#undef  r_quo
+#undef  r_div
+
+
+/*******************************************************
+    Fractional Division 16 / 16
+*******************************************************/
+#define r_divdL 26     /* dividend Low */
+#define r_divdH 27     /* dividend Hig */
+#define r_quoL  24     /* quotient Low */
+#define r_quoH  25     /* quotient High */
+#define r_divL  22     /* divisor */
+#define r_divH  23     /* divisor */
+#define r_cnt   21
+
+#if defined (L_divhq3)
+DEFUN   __divhq3
+    mov     r0, r_divdH
+    eor     r0, r_divH
+    sbrs    r_divH, 7
+    rjmp    1f
+    NEG2    r_divL
+1:
+    sbrs    r_divdH, 7
+    rjmp    2f
+    NEG2    r_divdL
+2:
+    cp      r_divdL, r_divL
+    cpc     r_divdH, r_divH
+    breq    __divhq3_minus1  ; if equal return -1
+    XCALL   __udivuhq3
+    lsr     r_quoH
+    ror     r_quoL
+    brpl    9f
+    ;; negate result if needed
+    NEG2    r_quoL
+9:
+    ret
+__divhq3_minus1:
+    ldi     r_quoH, 0x80
+    clr     r_quoL
+    ret
+ENDF __divhq3
+#endif  /* defined (L_divhq3) */
+
+#if defined (L_udivuhq3)
+DEFUN   __udivuhq3
+    sub     r_quoH,r_quoH   ; clear quotient and carry
+    ;; FALLTHRU
+ENDF __udivuhq3
+
+DEFUN   __udivuha3_common
+    clr     r_quoL          ; clear quotient
+    ldi     r_cnt,16        ; init loop counter
+__udivuhq3_loop:
+    rol     r_divdL         ; shift dividend (with CARRY)
+    rol     r_divdH
+    brcs    __udivuhq3_ep   ; dividend overflow
+    cp      r_divdL,r_divL  ; compare dividend & divisor
+    cpc     r_divdH,r_divH
+    brcc    __udivuhq3_ep   ; dividend >= divisor
+    rol     r_quoL          ; shift quotient (with CARRY)
+    rjmp    __udivuhq3_cont
+__udivuhq3_ep:
+    sub     r_divdL,r_divL  ; restore dividend
+    sbc     r_divdH,r_divH
+    lsl     r_quoL          ; shift quotient (without CARRY)
+__udivuhq3_cont:
+    rol     r_quoH          ; shift quotient
+    dec     r_cnt           ; decrement loop counter
+    brne    __udivuhq3_loop
+    com     r_quoL          ; complement result
+    com     r_quoH          ; because C flag was complemented in loop
+    ret
+ENDF __udivuha3_common
+#endif  /* defined (L_udivuhq3) */
+
+/*******************************************************
+    Fixed Division 8.8 / 8.8
+*******************************************************/
+#if defined (L_divha3)
+DEFUN   __divha3
+    mov     r0, r_divdH
+    eor     r0, r_divH
+    sbrs    r_divH, 7
+    rjmp    1f
+    NEG2    r_divL
+1:
+    sbrs    r_divdH, 7
+    rjmp    2f
+    NEG2    r_divdL
+2:
+    XCALL   __udivuha3
+    sbrs    r0, 7   ; negate result if needed
+    ret
+    NEG2    r_quoL
+    ret
+ENDF __divha3
+#endif  /* defined (L_divha3) */
+
+#if defined (L_udivuha3)
+DEFUN   __udivuha3
+    mov     r_quoH, r_divdL
+    mov     r_divdL, r_divdH
+    clr     r_divdH
+    lsl     r_quoH     ; shift quotient into carry
+    XJMP    __udivuha3_common ; same as fractional after rearrange
+ENDF __udivuha3
+#endif  /* defined (L_udivuha3) */
+
+#undef  r_divdL
+#undef  r_divdH
+#undef  r_quoL
+#undef  r_quoH
+#undef  r_divL
+#undef  r_divH
+#undef  r_cnt
+
+/*******************************************************
+    Fixed Division 16.16 / 16.16
+*******************************************************/
+
+#define r_arg1L  24    /* arg1 gets passed already in place */
+#define r_arg1H  25
+#define r_arg1HL 26
+#define r_arg1HH 27
+#define r_divdL  26    /* dividend Low */
+#define r_divdH  27
+#define r_divdHL 30
+#define r_divdHH 31    /* dividend High */
+#define r_quoL   22    /* quotient Low */
+#define r_quoH   23
+#define r_quoHL  24
+#define r_quoHH  25    /* quotient High */
+#define r_divL   18    /* divisor Low */
+#define r_divH   19
+#define r_divHL  20
+#define r_divHH  21    /* divisor High */
+#define r_cnt  __zero_reg__  /* loop count (0 after the loop!) */
+
+#if defined (L_divsa3)
+DEFUN   __divsa3
+    mov     r0, r_arg1HH
+    eor     r0, r_divHH
+    sbrs    r_divHH, 7
+    rjmp    1f
+    NEG4    r_divL
+1:
+    sbrs    r_arg1HH, 7
+    rjmp    2f
+    NEG4    r_arg1L
+2:
+    XCALL   __udivusa3
+    sbrs    r0, 7   ; negate result if needed
+    ret
+    NEG4    r_quoL
+    ret
+ENDF __divsa3
+#endif  /* defined (L_divsa3) */
+
+#if defined (L_udivusa3)
+DEFUN   __udivusa3
+    ldi     r_divdHL, 32    ; init loop counter
+    mov     r_cnt, r_divdHL
+    clr     r_divdHL
+    clr     r_divdHH
+    wmov    r_quoL, r_divdHL
+    lsl     r_quoHL         ; shift quotient into carry
+    rol     r_quoHH
+__udivusa3_loop:
+    rol     r_divdL         ; shift dividend (with CARRY)
+    rol     r_divdH
+    rol     r_divdHL
+    rol     r_divdHH
+    brcs    __udivusa3_ep   ; dividend overflow
+    cp      r_divdL,r_divL  ; compare dividend & divisor
+    cpc     r_divdH,r_divH
+    cpc     r_divdHL,r_divHL
+    cpc     r_divdHH,r_divHH
+    brcc    __udivusa3_ep   ; dividend >= divisor
+    rol     r_quoL          ; shift quotient (with CARRY)
+    rjmp    __udivusa3_cont
+__udivusa3_ep:
+    sub     r_divdL,r_divL  ; restore dividend
+    sbc     r_divdH,r_divH
+    sbc     r_divdHL,r_divHL
+    sbc     r_divdHH,r_divHH
+    lsl     r_quoL          ; shift quotient (without CARRY)
+__udivusa3_cont:
+    rol     r_quoH          ; shift quotient
+    rol     r_quoHL
+    rol     r_quoHH
+    dec     r_cnt           ; decrement loop counter
+    brne    __udivusa3_loop
+    com     r_quoL          ; complement result
+    com     r_quoH          ; because C flag was complemented in loop
+    com     r_quoHL
+    com     r_quoHH
+    ret
+ENDF __udivusa3
+#endif  /* defined (L_udivusa3) */
+
+#undef  r_arg1L
+#undef  r_arg1H
+#undef  r_arg1HL
+#undef  r_arg1HH
+#undef  r_divdL
+#undef  r_divdH
+#undef  r_divdHL
+#undef  r_divdHH
+#undef  r_quoL
+#undef  r_quoH
+#undef  r_quoHL
+#undef  r_quoHH
+#undef  r_divL
+#undef  r_divH
+#undef  r_divHL
+#undef  r_divHH
+#undef  r_cnt
--- a/libgcc/config/avr/lib1funcs.S
+++ b/libgcc/config/avr/lib1funcs.S
@ -91,6 +91,35 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .endfunc
 .endm

+;; Negate a 2-byte value held in consecutive registers
+.macro NEG2  reg
+    com     \reg+1
+    neg     \reg
+    sbci    \reg+1, -1
+.endm
+
+;; Negate a 4-byte value held in consecutive registers
+.macro NEG4  reg
+    com     \reg+3
+    com     \reg+2
+    com     \reg+1
+.if \reg >= 16
+    neg     \reg
+    sbci    \reg+1, -1
+    sbci    \reg+2, -1
+    sbci    \reg+3, -1
+.else
+    com     \reg
+    adc     \reg,   __zero_reg__
+    adc     \reg+1, __zero_reg__
+    adc     \reg+2, __zero_reg__
+    adc     \reg+3, __zero_reg__
+.endif
+.endm
+
+#define exp_lo(N)  hlo8 ((N) << 23)
+#define exp_hi(N)  hhi8 ((N) << 23)
+

 .section .text.libgcc.mul, "ax", @progbits

@ -126,175 +155,246 @@ ENDF __mulqi3
 	
 #endif 	/* defined (L_mulqi3) */

-#if defined (L_mulqihi3)
-DEFUN __mulqihi3
-	clr	r25
-	sbrc	r24, 7
-	dec	r25
-	clr	r23
-	sbrc	r22, 7
-	dec	r22
-	XJMP	__mulhi3
-ENDF __mulqihi3:
-#endif /* defined (L_mulqihi3) */
-
-#if defined (L_umulqihi3)
-DEFUN __umulqihi3
-	clr	r25
-	clr	r23
-	XJMP	__mulhi3
-ENDF __umulqihi3
-#endif /* defined (L_umulqihi3) */

 /*******************************************************
+    Widening Multiplication  16 = 8 x 8  without MUL
    Multiplication  16 x 16  without MUL
 *******************************************************/
+
+#define A0  r22
+#define A1  r23
+#define B0  r24
+#define BB0 r20
+#define B1  r25
+;; Output overlaps input, thus expand result in CC0/1
+#define C0  r24
+#define C1  r25
+#define CC0  __tmp_reg__
+#define CC1  R21
+
+#if defined (L_umulqihi3)
+;;; R25:R24 = (unsigned int) R22 * (unsigned int) R24
+;;; (C1:C0) = (unsigned int) A0  * (unsigned int) B0
+;;; Clobbers: __tmp_reg__, R21..R23
+DEFUN __umulqihi3
+    clr     A1
+    clr     B1
+    XJMP    __mulhi3
+ENDF __umulqihi3
+#endif /* L_umulqihi3 */
+
+#if defined (L_mulqihi3)
+;;; R25:R24 = (signed int) R22 * (signed int) R24
+;;; (C1:C0) = (signed int) A0  * (signed int) B0
+;;; Clobbers: __tmp_reg__, R20..R23
+DEFUN __mulqihi3
+    ;; Sign-extend B0
+    clr     B1
+    sbrc    B0, 7
+    com     B1
+    ;; The multiplication runs twice as fast if A1 is zero, thus:
+    ;; Zero-extend A0
+    clr     A1
+#ifdef __AVR_HAVE_JMP_CALL__
+    ;; Store  B0 * sign of A
+    clr     BB0
+    sbrc    A0, 7
+    mov     BB0, B0
+    call    __mulhi3
+#else /* have no CALL */
+    ;; Skip sign-extension of A if A >= 0
+    ;; Same size as with the first alternative but avoids errata skip
+    ;; and is faster if A >= 0
+    sbrs    A0, 7
+    rjmp    __mulhi3
+    ;; If  A < 0  store B
+    mov     BB0, B0
+    rcall   __mulhi3
+#endif /* HAVE_JMP_CALL */
+    ;; 1-extend A after the multiplication
+    sub     C1, BB0
+    ret
+ENDF __mulqihi3
+#endif /* L_mulqihi3 */
+
 #if defined (L_mulhi3)
-#define	r_arg1L	r24		/* multiplier Low */
-#define	r_arg1H	r25		/* multiplier High */
-#define	r_arg2L	r22		/* multiplicand Low */
-#define	r_arg2H	r23		/* multiplicand High */
-#define r_resL	__tmp_reg__	/* result Low */
-#define r_resH  r21		/* result High */
-
+;;; R25:R24 = R23:R22 * R25:R24
+;;; (C1:C0) = (A1:A0) * (B1:B0)
+;;; Clobbers: __tmp_reg__, R21..R23
 DEFUN __mulhi3
-	clr	r_resH		; clear result
-	clr	r_resL		; clear result
-__mulhi3_loop:
-	sbrs	r_arg1L,0
-	rjmp	__mulhi3_skip1
-	add	r_resL,r_arg2L	; result + multiplicand
-	adc	r_resH,r_arg2H
-__mulhi3_skip1:	
-	add	r_arg2L,r_arg2L	; shift multiplicand
-	adc	r_arg2H,r_arg2H

-	cp	r_arg2L,__zero_reg__
-	cpc	r_arg2H,__zero_reg__
-	breq	__mulhi3_exit	; while multiplicand != 0
+    ;; Clear result
+    clr     CC0
+    clr     CC1
+    rjmp 3f
+1:
+    ;; Bit n of A is 1  -->  C += B << n
+    add     CC0, B0
+    adc     CC1, B1
+2:
+    lsl     B0
+    rol     B1
+3:
+    ;; If B == 0 we are ready
+    sbiw    B0, 0
+    breq 9f

-	lsr	r_arg1H		; gets LSB of multiplier
-	ror	r_arg1L
-	sbiw	r_arg1L,0
-	brne	__mulhi3_loop	; exit if multiplier = 0
-__mulhi3_exit:
-	mov	r_arg1H,r_resH	; result to return register
-	mov	r_arg1L,r_resL
-	ret
-ENDF __mulhi3
+    ;; Carry = n-th bit of A
+    lsr     A1
+    ror     A0
+    ;; If bit n of A is set, then go add  B * 2^n  to  C
+    brcs 1b

-#undef r_arg1L
-#undef r_arg1H
-#undef r_arg2L
-#undef r_arg2H
-#undef r_resL 	
-#undef r_resH 
+    ;; Carry = 0  -->  The ROR above acts like  CP A0, 0
+    ;; Thus, it is sufficient to CPC the high part to test A against 0
+    cpc     A1, __zero_reg__
+    ;; Only proceed if A != 0
+    brne    2b
+9:
+    ;; Move Result into place
+    mov     C0, CC0
+    mov     C1, CC1
+    ret
+ENDF  __mulhi3
+#endif /* L_mulhi3 */

-#endif /* defined (L_mulhi3) */
+#undef A0
+#undef A1
+#undef B0
+#undef BB0
+#undef B1
+#undef C0
+#undef C1
+#undef CC0
+#undef CC1
+
+
+#define A0 22
+#define A1 A0+1
+#define A2 A0+2
+#define A3 A0+3
+
+#define B0 18
+#define B1 B0+1
+#define B2 B0+2
+#define B3 B0+3
+
+#define CC0 26
+#define CC1 CC0+1
+#define CC2 30
+#define CC3 CC2+1
+
+#define C0 22
+#define C1 C0+1
+#define C2 C0+2
+#define C3 C0+3

 /*******************************************************
    Widening Multiplication  32 = 16 x 16  without MUL
 *******************************************************/

-#if defined (L_mulhisi3)
-DEFUN __mulhisi3
-;;; FIXME: This is dead code (noone calls it)
-    mov_l   r18, r24
-    mov_h   r19, r25
-    clr     r24
-    sbrc    r23, 7
-    dec     r24
-    mov     r25, r24
-    clr     r20
-    sbrc    r19, 7
-    dec     r20
-    mov     r21, r20
-    XJMP    __mulsi3
-ENDF __mulhisi3
-#endif /* defined (L_mulhisi3) */
-
 #if defined (L_umulhisi3)
 DEFUN __umulhisi3
-;;; FIXME: This is dead code (noone calls it)
-    mov_l   r18, r24
-    mov_h   r19, r25
-    clr     r24
-    clr     r25
-    mov_l   r20, r24
-    mov_h   r21, r25
+    wmov    B0, 24
+    ;; Zero-extend B
+    clr     B2
+    clr     B3
+    ;; Zero-extend A
+    wmov    A2, B2
    XJMP    __mulsi3
 ENDF __umulhisi3
-#endif /* defined (L_umulhisi3) */
+#endif /* L_umulhisi3 */
+
+#if defined (L_mulhisi3)
+DEFUN __mulhisi3
+    wmov    B0, 24
+    ;; Sign-extend B
+    lsl     r25
+    sbc     B2, B2
+    mov     B3, B2
+#ifdef __AVR_ERRATA_SKIP_JMP_CALL__
+    ;; Sign-extend A
+    clr     A2
+    sbrc    A1, 7
+    com     A2
+    mov     A3, A2
+    XJMP __mulsi3
+#else /*  no __AVR_ERRATA_SKIP_JMP_CALL__ */
+    ;; Zero-extend A and __mulsi3 will run at least twice as fast
+    ;; compared to a sign-extended A.
+    clr     A2
+    clr     A3
+    sbrs    A1, 7
+    XJMP __mulsi3
+    ;; If  A < 0  then perform the  B * 0xffff.... before the
+    ;; very multiplication by initializing the high part of the
+    ;; result CC with -B.
+    wmov    CC2, A2
+    sub     CC2, B0
+    sbc     CC3, B1
+    XJMP __mulsi3_helper
+#endif /*  __AVR_ERRATA_SKIP_JMP_CALL__ */
+ENDF __mulhisi3
+#endif /* L_mulhisi3 */
+

-#if defined (L_mulsi3)
 /*******************************************************
    Multiplication  32 x 32  without MUL
 *******************************************************/
-#define r_arg1L  r22		/* multiplier Low */
-#define r_arg1H  r23
-#define	r_arg1HL r24
-#define	r_arg1HH r25		/* multiplier High */
-
-#define	r_arg2L  r18		/* multiplicand Low */
-#define	r_arg2H  r19	
-#define	r_arg2HL r20
-#define	r_arg2HH r21		/* multiplicand High */
-	
-#define r_resL	 r26		/* result Low */
-#define r_resH   r27
-#define r_resHL	 r30
-#define r_resHH  r31		/* result High */

+#if defined (L_mulsi3)
 DEFUN __mulsi3
-	clr	r_resHH		; clear result
-	clr	r_resHL		; clear result
-	clr	r_resH		; clear result
-	clr	r_resL		; clear result
-__mulsi3_loop:
-	sbrs	r_arg1L,0
-	rjmp	__mulsi3_skip1
-	add	r_resL,r_arg2L		; result + multiplicand
-	adc	r_resH,r_arg2H
-	adc	r_resHL,r_arg2HL
-	adc	r_resHH,r_arg2HH
-__mulsi3_skip1:
-	add	r_arg2L,r_arg2L		; shift multiplicand
-	adc	r_arg2H,r_arg2H
-	adc	r_arg2HL,r_arg2HL
-	adc	r_arg2HH,r_arg2HH
-	
-	lsr	r_arg1HH	; gets LSB of multiplier
-	ror	r_arg1HL
-	ror	r_arg1H
-	ror	r_arg1L
-	brne	__mulsi3_loop
-	sbiw	r_arg1HL,0
-	cpc	r_arg1H,r_arg1L
-	brne	__mulsi3_loop		; exit if multiplier = 0
-__mulsi3_exit:
-	mov_h	r_arg1HH,r_resHH	; result to return register
-	mov_l	r_arg1HL,r_resHL
-	mov_h	r_arg1H,r_resH
-	mov_l	r_arg1L,r_resL
-	ret
-ENDF __mulsi3
+    ;; Clear result
+    clr     CC2
+    clr     CC3
+    ;; FALLTHRU
+ENDF  __mulsi3

-#undef r_arg1L 
-#undef r_arg1H 
-#undef r_arg1HL
-#undef r_arg1HH
-             
-#undef r_arg2L 
-#undef r_arg2H 
-#undef r_arg2HL
-#undef r_arg2HH
-             
-#undef r_resL  
-#undef r_resH  
-#undef r_resHL 
-#undef r_resHH 
+DEFUN __mulsi3_helper
+    clr     CC0
+    clr     CC1
+    rjmp 3f

-#endif /* defined (L_mulsi3) */
+1:  ;; If bit n of A is set, then add  B * 2^n  to the result in CC
+    ;; CC += B
+    add  CC0,B0  $  adc  CC1,B1  $  adc  CC2,B2  $  adc  CC3,B3
+
+2:  ;; B <<= 1
+    lsl  B0      $  rol  B1      $  rol  B2      $  rol  B3
+    
+3:  ;; A >>= 1:  Carry = n-th bit of A
+    lsr  A3      $  ror  A2      $  ror  A1      $  ror  A0
+
+    brcs 1b
+    ;; Only continue if  A != 0
+    sbci    A1, 0
+    brne 2b
+    sbiw    A2, 0
+    brne 2b
+
+    ;; All bits of A are consumed:  Copy result to return register C
+    wmov    C0, CC0
+    wmov    C2, CC2
+    ret
+ENDF __mulsi3_helper
+#endif /* L_mulsi3 */
+
+#undef A0
+#undef A1
+#undef A2
+#undef A3
+#undef B0
+#undef B1
+#undef B2
+#undef B3
+#undef C0
+#undef C1
+#undef C2
+#undef C3
+#undef CC0
+#undef CC1
+#undef CC2
+#undef CC3

 #endif /* !defined (__AVR_HAVE_MUL__) */
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@ -316,7 +416,7 @@ ENDF __mulsi3
 #define C3 C0+3

 /*******************************************************
-    Widening Multiplication  32 = 16 x 16
+    Widening Multiplication  32 = 16 x 16  with MUL
 *******************************************************/
                              
 #if defined (L_mulhisi3)
@ -364,7 +464,17 @@ DEFUN __umulhisi3
    mul     A1, B1
    movw    C2, r0
    mul     A0, B1
+#ifdef __AVR_HAVE_JMP_CALL__
+    ;; This function is used by many other routines, often multiple times.
+    ;; Therefore, if the flash size is not too limited, avoid the RCALL
+    ;; and inverst 6 Bytes to speed things up.
+    add     C1, r0
+    adc     C2, r1
+    clr     __zero_reg__
+    adc     C3, __zero_reg__
+#else
    rcall   1f
+#endif
    mul     A1, B0
 1:  add     C1, r0
    adc     C2, r1
@ -375,7 +485,7 @@ ENDF __umulhisi3
 #endif /* L_umulhisi3 */

 /*******************************************************
-    Widening Multiplication  32 = 16 x 32
+    Widening Multiplication  32 = 16 x 32  with MUL
 *******************************************************/

 #if defined (L_mulshisi3)
@ -425,7 +535,7 @@ ENDF __muluhisi3
 #endif /* L_muluhisi3 */

 /*******************************************************
-    Multiplication  32 x 32
+    Multiplication  32 x 32  with MUL
 *******************************************************/

 #if defined (L_mulsi3)
@ -468,7 +578,7 @@ ENDF __mulsi3
 #endif /* __AVR_HAVE_MUL__ */

 /*******************************************************
-       Multiplication 24 x 24
+       Multiplication 24 x 24 with MUL
 *******************************************************/

 #if defined (L_mulpsi3)
@ -1247,6 +1357,19 @@ __divmodsi4_exit:
 ENDF __divmodsi4
 #endif /* defined (L_divmodsi4) */

+#undef r_remHH
+#undef r_remHL
+#undef r_remH
+#undef r_remL
+#undef r_arg1HH
+#undef r_arg1HL
+#undef r_arg1H
+#undef r_arg1L
+#undef r_arg2HH
+#undef r_arg2HL
+#undef r_arg2H
+#undef r_arg2L
+#undef r_cnt

 /*******************************************************
       Division 64 / 64
@ -2757,9 +2880,7 @@ DEFUN __fmulsu_exit
    XJMP  __fmul
 1:  XCALL __fmul
    ;; C = -C iff A0.7 = 1
-    com  C1
-    neg  C0
-    sbci C1, -1
+    NEG2 C0
    ret
 ENDF __fmulsu_exit
 #endif /* L_fmulsu */
@ -2794,3 +2915,5 @@ ENDF __fmul
 #undef B1
 #undef C0
 #undef C1
+
+#include "lib1funcs-fixed.S"
--- a/libgcc/config/avr/t-avr
+++ b/libgcc/config/avr/t-avr
@ -2,6 +2,7 @@ LIB1ASMSRC = avr/lib1funcs.S
 LIB1ASMFUNCS = \
 	_mulqi3 \
 	_mulhi3 \
+	_mulqihi3 _umulqihi3 \
 	_mulpsi3 _mulsqipsi3 \
 	_mulhisi3 \
 	_umulhisi3 \
@ -55,6 +56,24 @@ LIB1ASMFUNCS = \
 	_cmpdi2 _cmpdi2_s8 \
 	_fmul _fmuls _fmulsu

+# Fixed point routines in avr/lib1funcs-fixed.S
+LIB1ASMFUNCS += \
+	_fractqqsf _fractuqqsf \
+	_fracthqsf _fractuhqsf _fracthasf _fractuhasf \
+	_fractsasf _fractusasf _fractsqsf _fractusqsf \
+	\
+	_fractsfqq _fractsfuqq \
+	_fractsfhq _fractsfuhq _fractsfha _fractsfuha \
+	_fractsfsa _fractsfusa \
+	_mulqq3 \
+	_mulhq3 _muluhq3 \
+	_mulha3 _muluha3 _muluha3_round \
+	_mulsa3 _mulusa3 \
+	_divqq3 _udivuqq3 \
+	_divhq3 _udivuhq3 \
+	_divha3 _udivuha3 \
+	_divsa3 _udivusa3
+
 LIB2FUNCS_EXCLUDE = \
 	_moddi3 _umoddi3 \
 	_clz
@ -81,3 +100,49 @@ libgcc-objects += $(patsubst %,%$(objext),$(hiintfuncs16))
 ifeq ($(enable_shared),yes)
 libgcc-s-objects += $(patsubst %,%_s$(objext),$(hiintfuncs16))
 endif
+
+
+# Filter out supported conversions from fixed-bit.c
+
+conv_XY=$(conv)$(mode1)$(mode2)
+conv_X=$(conv)$(mode)
+
+# Conversions supported by the compiler
+
+convf_modes =	 QI UQI QQ UQQ \
+		 HI UHI HQ UHQ HA UHA \
+		 SI USI SQ USQ SA USA \
+		 DI UDI DQ UDQ DA UDA \
+		 TI UTI TQ UTQ TA UTA
+
+LIB2FUNCS_EXCLUDE += \
+	$(foreach conv,_fract _fractuns,\
+	$(foreach mode1,$(convf_modes),\
+	$(foreach mode2,$(convf_modes),$(conv_XY))))
+
+# Conversions supported by lib1funcs-fixed.S
+
+conv_to_sf_modes   = QQ UQQ HQ UHQ HA UHA SQ USQ SA USA
+conv_from_sf_modes = QQ UQQ HQ UHQ HA UHA        SA USA
+
+LIB2FUNCS_EXCLUDE += \
+	$(foreach conv,_fract, \
+	$(foreach mode1,$(conv_to_sf_modes), \
+	$(foreach mode2,SF,$(conv_XY))))
+
+LIB2FUNCS_EXCLUDE += \
+	$(foreach conv,_fract,\
+	$(foreach mode1,SF,\
+	$(foreach mode2,$(conv_from_sf_modes),$(conv_XY))))
+
+# Arithmetik supported by the compiler
+
+allfix_modes = QQ UQQ HQ UHQ HA UHA SQ USQ SA USA DA UDA DQ UDQ TQ UTQ TA UTA
+
+LIB2FUNCS_EXCLUDE += \
+	$(foreach conv,_add _sub,\
+	$(foreach mode,$(allfix_modes),$(conv_X)3))
+
+LIB2FUNCS_EXCLUDE += \
+	$(foreach conv,_lshr _ashl _ashr _cmp,\
+	$(foreach mode,$(allfix_modes),$(conv_X)))