aboutsummaryrefslogtreecommitdiffstats
path: root/contrib/libs/llvm12/lib/Target/X86/README-FPStack.txt
blob: aab9759b352104db52b0dbedbfee2375b194e4a9 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
//===---------------------------------------------------------------------===// 
// Random ideas for the X86 backend: FP stack related stuff 
//===---------------------------------------------------------------------===// 
 
//===---------------------------------------------------------------------===// 
 
Some targets (e.g. athlons) prefer freep to fstp ST(0): 
http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html 
 
//===---------------------------------------------------------------------===// 
 
This should use fiadd on chips where it is profitable: 
double foo(double P, int *I) { return P+*I; } 
 
We have fiadd patterns now but the followings have the same cost and 
complexity. We need a way to specify the later is more profitable. 
 
def FpADD32m  : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW, 
                    [(set RFP:$dst, (fadd RFP:$src1, 
                                     (extloadf64f32 addr:$src2)))]>; 
                // ST(0) = ST(0) + [mem32] 
 
def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW, 
                    [(set RFP:$dst, (fadd RFP:$src1, 
                                     (X86fild addr:$src2, i32)))]>; 
                // ST(0) = ST(0) + [mem32int] 
 
//===---------------------------------------------------------------------===// 
 
The FP stackifier should handle simple permutates to reduce number of shuffle 
instructions, e.g. turning: 
 
fld P	->		fld Q 
fld Q			fld P 
fxch 
 
or: 
 
fxch	->		fucomi 
fucomi			jl X 
jg X 
 
Ideas: 
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html 
 
 
//===---------------------------------------------------------------------===// 
 
Add a target specific hook to DAG combiner to handle SINT_TO_FP and 
FP_TO_SINT when the source operand is already in memory. 
 
//===---------------------------------------------------------------------===// 
 
Open code rint,floor,ceil,trunc: 
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html 
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html 
 
Opencode the sincos[f] libcall. 
 
//===---------------------------------------------------------------------===// 
 
None of the FPStack instructions are handled in 
X86RegisterInfo::foldMemoryOperand, which prevents the spiller from 
folding spill code into the instructions. 
 
//===---------------------------------------------------------------------===// 
 
Currently the x86 codegen isn't very good at mixing SSE and FPStack 
code: 
 
unsigned int foo(double x) { return x; } 
 
foo: 
	subl $20, %esp 
	movsd 24(%esp), %xmm0 
	movsd %xmm0, 8(%esp) 
	fldl 8(%esp) 
	fisttpll (%esp) 
	movl (%esp), %eax 
	addl $20, %esp 
	ret 
 
This just requires being smarter when custom expanding fptoui. 
 
//===---------------------------------------------------------------------===//