- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

If no,canQP be emulatedon XMM and YMM registers with small overhead (< 2X slowdown)compared to the double precision FP arithmetic?

Thanks,

Nick

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Did you mean a "boost over the double precision floating point" instead if a "boost over thequadprecision floating point"?

Nick

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

are there any plans to incorporate native hardware-based support for quad precision, at least into Xeon processors? The performance ofthe pure software implementation is generally too slow for our purposes - mostly large regular financial summation tasks. If not, is there a forum of sorts where one can register interest in hardware support for quad precision in Xeon processors?

Thanks,

Anders

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

As this would be a long term project (years), I hope you are working with the current implementations of parallelism.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Anders

If by quad-precision you have 128-bit Binary Integer Decimal in mind or as candidate for consideration (BID encoding can deal with rounding and precision propagation issues better than binary FP encoding, Intel's DFP library is a great place to start.

You might want to contact the leader of Intel DFP library (he may be able to brief you of future release plan for that library.

http://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library/?wapkw=decimal+fl...

You can also contact me offline to explore potential performance headroom on second andthird generation intel core processorsor Intel Xeon E3 and E5 processors.

Shihjong

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*...The performance ofthe pure software implementation is generally too slow for our purposes - mostly large*

**regular financial summation tasks**

*...*

Could you explain why do you need a 128-bit precision in that case?

Rounding problemscreatereal troubles in case of exchange operations and it would beinteresting to understand

what your problem is.

Best regards,

Sergey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sometimes it could be useful.When youdeal with the speed of execution vs precision and you do not want the arbitrary precision implementation which is slower than hardware registers.Could you explain why do you need a 128-bit precision in that case

For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.

Sometimes it could be useful.When youdeal with the speed of execution vs precision and you do not want the arbitrary precision implementation which is slower than hardware registers.Could you explain why do you need a 128-bit precision in that case

For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.

You could find plenty of references on the limitations of simply relying on extra precision for range reduction, as the x87 firmware does.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

My point was that no matter how much hardware precision you have, you still need a higher precision range reduction algorithm to support trig functions on your new high precision.

If the market demand were seen, no doubt someone would study the feasibility of vector quad precision on future 256- and 512-bit register platforms.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The fastest algorithms use IEEE 754.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I agree with you on this.We must also ask for what purpose should the hardware and ISA be modified to implement quadprecision or even more.I suppose that thereare not many mainstream math or engineeringapplicationthat need to calculatequad precisiontranscend. functions values.And for those esoteric application or highly sofisticated math packages(Mathematica ,Matlab)which calculates trig function with arbitrary precisionthe memory array model will be the best implementation albeit at theprice of speed of execution.I didn't say there is no need for quad precision. All widely used Fortran compilers have it, for example, with software implementation. Performance deficiency of current quad precision is due as much to lack of vectorizability as lack of single hardware instruction

**>>you still need a higher precision range reduction algorithm to support trig functions on your new high precision**

It is catch-22 situation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*I didn't say there is no need for quad precision...*

Borland C++ compiler v5.xincludes a

**BCD Number Library**and it allows to work with numbers up to 5,000 digits. A question is:

ShouldI wait for a hardware support of 256-bit or 512-bit precisionsif some workaround could be used?

Also, having workedin financial industry for many years I could say thataccuracy of calculations ismore important than speed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Java also has two arbitrary precision classes: Big Integer and Big Decimal.But it is unintuitive to work with these classes because numerical primitives like float or int are represented by objects and so simple arithmketic operations are done on objects so you have a large overhead of memory space needed to store them and time when you are doing calculation is very slow even hundreds times slower than in the case of arithmetics done on primitive types.Borland C++ compiler v5.xincludes a

BCD Number Libraryand it allows to work with numbers up to 5,000 digits.

The question is what kind of applications beside some esoteric pure mathematical soft which calculates Pi untill thousands of digits and sophisticated math packages like Mathematica needs such a precision.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

In e.g. C++ you have structs/classes without necessarily needing heap space, and you have operator overloading. Also, some C(++) compilers (e.g. gcc: __int128) allow for 128 bit integers. Intel's C compiler also knows about some kind of 128 bit floats which are emulated quite efficiently.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The business problem is that 14-15 significant digits is not enough to retain sufficient precision for amounts of moeny in bookkeeping applications where large transaction volumes are totaled up on a regular basis. The problem is particularly apparent when adding low unit value currencies such as VND or IDR.

The numbers in playare not typically integers, although for simple summation they could be shifted a few digits but this would only solve some of the applications and thus adds to the overall complexity.

Our current solution under investigation is a software implementation of a 128-bit decimal type based on IEEE 754-2008. The performance so far is 1-2 orders of maginitude slower than the corresponding 64-bit data types currently used.

Since our software is deployed in Windows environments the only alternative to a software implementation I currently see is the FPGA route. But that's not particularly attractive as FPGA hardware it would have to be installed in bulk on servers is outsourced data centres at substantial cost.

I'm aware that asking for 128-bit precision support at CPU level is a request for the long-term. However, with the current performance penalty we see from the software implementation it is clear that while it may work in limited areas for a whileit will never be something we or our clients will be happy with.

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*...The problem is particularly apparent when adding low unit value currencies such as*

**VND**or IDR.*The numbers in playare not typically integers, although for simple summation they could be shifted a few digits but this wouldonly solve some of the applications and thus adds to the overall complexity...* [

**SergeyK**]

**VND**is a Vietnamise Dong and acurrent exchange rate is about $4.8*10^-5 USD ( 0.0000479846 ).

Believe me, that thecurrency exchange problem is solved for years ( since first PCs appeared at banks)

by introducing a normalization factor ( or a CurrencyUnit )and in case of

**VND**it has to be equal to10^5.

Another way is to do calculations in aBase Currency and usuallythis is USD ( $100 USD = 1 / 0.0000479846 VND * 100).

I would like to repeat that something is really wrong conceptually with a way summations

are done in your software. It could be alsorelated to anot efficientdatabase design.

*...I currently see is the FPGA route. But that's not particularly attractive as FPGA hardware it would have to be installedin bulk on servers is outsourced data centres at substantial cost...* [

**SergeyK**] That "FPGA solution"is clearlynot the best one and I make that statement because I worked

as a C++ Software Developer for the Financial Industry for more than 8 years and

was involvedina design and implementation of several financial systems (two of them were

Certifiedata National Bank of some country).

Best regards,

Sergey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page