One of the problems of many big int implementations is that kotlinlang #mathematics

One of the problems of many big int implementation...

Zhelenskiy

05/06/2021, 4:45 PM

One of the problems of many big int implementations is that they work extremely slow for not so big numbers.. I think it is a common case when number of really big numbers operations is slow so real numbers can be used with longs in most cases and overflows are rare. So, I created a demo to demonstrate it. I didn't compare with KMath's implementation but compared with java.lang.Math's one. Even when I give no helping info, it is about 4 times faster. When I used explicit

LongBased

type, it became 10 faster than generic one. The speed was the same with just manual long checking. This is because I used inline types. However, it is still a lot slower than just pure usage of

Int

s and `Long`s. But that may be achieved if the code is run under HotSpot and it decides to use intrinsic there. Here is the repo: https://github.com/zhelenskiy/BigInteger/tree/main.

Zhelenskiy

05/06/2021, 4:48 PM

I even started implementing normal array-based big ints there but got several internal backend errors and created issues so I am waiting for their resolution now.

Iaroslav Postovalov

05/06/2021, 5:21 PM

The link is broken

Zhelenskiy

05/06/2021, 5:29 PM

@Iaroslav Postovalov Try again, please. I accidentally created the repo as private.

Iaroslav Postovalov

05/06/2021, 5:36 PM

If your use case needs big integer which has to work fast both for small and large numbers, you can create your own wrapper type which checks if the number inside is in the limits of

long

and stores it as

long

. By extending

sealed interface

idea you can implement arithmetic functions as extensions to

sealed interface BigInteger

which are matching the number to

LongBasedBigInteger

or something else; however, there may be some issues with comparing the number and detecting overflows.

Iaroslav Postovalov

05/06/2021, 5:40 PM

I also think that it is obvious that primitive

long

type is much faster than JDK

BigInteger

altavir

05/06/2021, 6:36 PM

As I said earlier, any contributions are welcome. KMath allows several differen sets of operations on one type simultaniously, so it is is even possible to have differenr algrbras with the same structure representation. It would be nice to have bigint coded by an inline (JvmInline value) class over byte-array. It could help a lot with arrays.

Zhelenskiy

05/06/2021, 6:44 PM

Isn't IntArray better here? I think we can divide our big int into pieces of 4 bytes instead of 1: this is more effective. Furthermore, it is still not difficult to take overflowed part as we can convert to long which is of 8 byte. What do you think about it?

altavir

05/06/2021, 6:45 PM

Does not matter, we use intArray now. But there is also a flag byte. It would be better to have a uniform array instead

Zhelenskiy

05/06/2021, 6:47 PM

What is the flag byte?

altavir

05/06/2021, 6:48 PM

this one: https://github.com/mipt-npm/kmath/blob/c8fb770ffddb6428790a762e32513414503a2b32/km[…]src/commonMain/kotlin/space/kscience/kmath/operations/BigInt.kt

altavir

05/06/2021, 6:48 PM

it is used for the sign

Zhelenskiy

05/06/2021, 6:49 PM

Why don't you use a enum?

altavir

05/06/2021, 6:49 PM

It won't really help. But my point is that there should not be any additional sign object at all. I am not the one who wrote that part, it was contributed.

Zhelenskiy

05/06/2021, 6:51 PM

How do you expect that to be done?

Zhelenskiy

05/06/2021, 6:51 PM

I think that using int is an overkill

altavir

05/06/2021, 6:52 PM

Probably use first int in the array for sign. We will loose several bytes in size, but save a lot more in pointers and memory indicection

Zhelenskiy

05/06/2021, 6:55 PM

Oh, I see your point. I agree with you.

Zhelenskiy

05/06/2021, 10:28 PM

If you store digits this way, you cannot guarantee that there are no trailing zeroes: 2^31 is too big for

IntArray

of size 1, but if I use the 0-th element as UInt, I have no overflow, so next Int will be 0, trailing zero.

Zhelenskiy

05/06/2021, 11:10 PM

The rule of no trailing zeroes may be better applied to all zeroes but last. Another way to do is to invent optimization of trailng zeroes: we often have some zeroes after some operations that became trailing. So now we need to recreate IntArray without them. I think we can store capacity which is not more

Int.MAX_VALUE

. Then we can store the flag with with capacity using bitmask.

Zhelenskiy

05/09/2021, 9:49 PM

I tried to implement such built-in signed numbers and to benchmark their summing speed but it is 4 times less than java's one (but still a lot better for small ones). This is not effect of using

LongBased

optimization because direct calls to the unoptimized code has the same low speed. However, it may be connected with that I do something wrong. I haven't published the version yet. Small numbers

Copy code

java.lang.Math.BigInteger:		2.18s		2.04s		1.84s		2.67s		1.86s
Kotlin (Generic Long based):	1.89s		1.41s		1.44s		1.81s		1.69s
Kotlin (Explicit Long based):	63.1ms		67.4ms		48.8ms		55.1ms		51.3ms
Kotlin (Generic Array based):	1.34s		1.52s		2.08s		1.59s		1.42s
Kotlin (Explicit Array based):	6.41s		5.98s		7.10s		6.58s		6.32s
Pure Int:						2.57ms		2.32ms		596us		7.31us		8.70us
Pure Long:						6.65ms		8.52ms		5.57ms		4.58ms		4.28ms
Checked Long:					37.3ms		33.4ms		85.0ms		77.7ms		72.1ms

For big numbers (kotlin array = direct call to array based):

Copy code

Kotlin: 14.7ms		Kotlin Array: 4.49ms		Java: 9.92ms
Kotlin: 23.1ms		Kotlin Array: 5.02ms		Java: 3.20ms
Kotlin: 1.87ms		Kotlin Array: 1.92ms		Java: 582us
Kotlin: 2.17ms		Kotlin Array: 2.06ms		Java: 544us
Kotlin: 5.56ms		Kotlin Array: 5.46ms		Java: 731us
Kotlin: 3.51ms		Kotlin Array: 3.28ms		Java: 778us
Kotlin: 2.88ms		Kotlin Array: 3.75ms		Java: 188us
Kotlin: 2.99ms		Kotlin Array: 2.73ms		Java: 416us
Kotlin: 5.40ms		Kotlin Array: 2.90ms		Java: 341us
Kotlin: 1.60ms		Kotlin Array: 1.55ms		Java: 340us

Zhelenskiy

05/09/2021, 9:50 PM

Should I publish the current version?

altavir

05/10/2021, 6:16 AM

If you are talking about KMath, you can make a pull request with the code and we can discuss it there, maybe recommend how to imporve things. Of course, the new solution must be better than the one already there (not necessary better than Java BigInt).

Zhelenskiy

05/10/2021, 1:09 PM

@Iaroslav Postovalov I accidentally skipped your message. That is actually what is done. The multiplication is not only faster when I access LongBasedBigInt directly, but even when accessing from Generic BigInt.

Zhelenskiy

05/10/2021, 3:16 PM

@altavir The disadvantage of inlining sign into magnitude is that unary minus would become very heavy operation (we would need to copy all magnitude for it) and it would't be a good idea anymore to use it in plus operator or somewhere else. Received minor acceleration inlining IntArray doesn't worth that.

altavir

05/10/2021, 3:31 PM

Array copy is actually very cheap. Memory indirection is much more expensive

Zhelenskiy

05/10/2021, 3:33 PM

Memory copy isn't cheap as magnitude size may be very big

Zhelenskiy

05/15/2021, 9:45 AM

I think that we can make UBigInt (that is semiring or maybe semifield). There are lots of cases (e.g. Cryptography) when numbers are only non-negative. In this cases there is no necessity and advantage to store the sign separately. So UIntArray should be definitely inlined.

Zhelenskiy

05/15/2021, 9:47 AM

Such unsigned bigint can be used inside usual bigint

Zhelenskiy

05/15/2021, 9:49 AM

However, there is still choice: to have one UInt for size (without leading zeroes) or to recreate array without leading zeroes.

altavir

05/15/2021, 9:57 AM

I would prefer to have some kind of discussion issue about that with use-cases. I do not have any preferences about inner implementation right now, so any ideas and contributions are welcome.

Zhelenskiy

05/15/2021, 9:59 AM

So I want to replace

type alias Magnitude = UIntArray

with @JvmInline value class UBigInt(private val array: UIntArray) { private inline operator fun get(index: Int) = array[index] private inline operator fun set(index: Int, UInt) = .... }

Zhelenskiy

05/15/2021, 9:59 AM

In Github?

altavir

05/15/2021, 10:19 AM

yes

Zhelenskiy

05/15/2021, 10:36 AM

https://github.com/mipt-npm/kmath/issues/340

😎 1

Zhelenskiy

05/16/2021, 9:34 PM

As it is needed here, I'll reask: https://kotlinlang.slack.com/archives/C0922A726/p1621180402020600

Zhelenskiy

05/16/2021, 9:39 PM

(I don't expect getting answer soon there as #general looks a bit flooded)

Zhelenskiy

05/28/2021, 7:38 AM

As equality operator cannot be overridden, I find this idea to be wrong, @altavir 🥺

altavir

05/28/2021, 7:46 AM

It is a good question. I will start a new thread then.

Zhelenskiy

05/28/2021, 8:26 AM

As a result I may notice that the approach should be freezed until overrideable equality operator appearance.

altavir

05/28/2021, 8:27 AM

Yes. Please open issues if you find any inconsistencies. By the way, equality does not make a lot of sense anyway. Consider floating point for example.

Zhelenskiy

05/28/2021, 8:28 AM

What do you mean?

altavir

05/28/2021, 8:28 AM

I meant that 5.0-1.0 is not necessary equal 4.0.

Zhelenskiy

05/28/2021, 8:29 AM

Yes, but 5.toBigInt is expected to be equal to 5.toBigInt

Zhelenskiy

05/28/2021, 8:29 AM

As it is not a double

altavir

05/28/2021, 8:30 AM

Yes, I agree in that case, but it would be inconsistent if some mathematical objects could be compared for equality and some not.

👍 1

altavir

05/28/2021, 8:33 AM

As I mentioned in the other thread, I do not have any clear decision on this point. When we were working with matrices,

equals

approch was proven as inconsistent for now. Partially because of inlines.

Zhelenskiy

05/28/2021, 8:38 AM

Maybe there could be some optimizations to fasten equality operator. I'll post a bit later

altavir

05/28/2021, 8:39 AM

It is possible for immutable objects.

Zhelenskiy

05/28/2021, 8:39 AM

Now

altavir

05/28/2021, 8:40 AM

But first we need to understand if there are use-cases for equality. I can't imagine a lot of them

Zhelenskiy

05/28/2021, 8:45 AM

Unfortunately, we cannot remove equality operator, so we need to choose between inconsistency and slowness. I choose the second

altavir

05/28/2021, 8:45 AM

Well, actually we can. In a number of ways. The question if we want to.

altavir

05/28/2021, 8:48 AM

message has been deleted

altavir

05/28/2021, 8:48 AM

As an example how it could be done

Zhelenskiy

05/28/2021, 8:51 AM

Yes, of course. My idea is about optimizations: we can 5 times compare random digits firstly to fasten case

False

altavir

05/28/2021, 8:52 AM

It does not make a lot of sense. For BigDecimal you can just compute the hashcode on creation since it is immutable.

altavir

05/28/2021, 8:52 AM

But again, do we need it?

Zhelenskiy

05/28/2021, 9:00 AM

I actually agree with @elizarov https://kotlinlang.slack.com/archives/CE5HPKBRN/p1622189914024900?thread_ts=1622188176.024000&cid=CE5HPKBRN

altavir

05/28/2021, 9:01 AM

It is not about that. The question is do we use equality anywhere.

Zhelenskiy

05/28/2021, 9:01 AM

@altavir I think, that comparing integers seems to be a natural operation.

altavir

05/28/2021, 9:04 AM

Maybe. But Use cases for BigInt are not the same for Int.

altavir

05/28/2021, 9:04 AM

For example, you would never use BigInt as index

Zhelenskiy

05/28/2021, 9:04 AM

Currently, that is my case 😁

altavir

05/28/2021, 9:05 AM

Could you please describe the use case in an issue? I can't imagine where it would be needed.

Zhelenskiy

05/28/2021, 9:09 AM

I want to do some mix of Excel and Jupiter Notebook. I also don't want to give restrictions for max index. But I wouldn't iterate one by one, of course

altavir

05/28/2021, 9:18 AM

Using BigInt for indexes will immediately impact performance. The much better way is to use parttions instead, so each cell is codede by a partition and index. If for some reason you do not have enough cells (and it is impossible to imagine in notebook environment), you can automatically switch to a different partition,

Zhelenskiy

05/28/2021, 9:19 AM

But how to index partitions then?

altavir

05/28/2021, 9:20 AM

Any way you want. If you take long*long, you will probably cover all the memory in the universe, but you can use any comparable ids for partitions, like strings since you won't be switching between them frequently.

Zhelenskiy

05/28/2021, 9:38 AM

I would not store all the used range as an array so I would use indexes as names in map

Zhelenskiy

05/28/2021, 9:44 AM

Or did you mean anything else?

Zhelenskiy

05/28/2021, 9:47 AM

@altavir

altavir

05/28/2021, 9:55 AM

Yes, but it is better to include optional partition field in indexing and use default partition. But this problem asks for a new 🧵

Zhelenskiy

05/28/2021, 9:57 AM

But what to do if I iterate over the next partition? What should I do with indexes?

Zhelenskiy

05/28/2021, 9:58 AM

Maybe, I can have optimization for long-ranged BigInts and that would speed up (not much, unfortunately) evaluation.

altavir

05/28/2021, 10:09 AM

You will have to use partition switch with performance impact like 1/Long.MAX_VALUE

Zhelenskiy

05/28/2021, 11:07 AM

How do you expect indexing there? I misunderstand your concept. Give an example, please.

altavir

05/28/2021, 11:34 AM

I mean that you have to have pages in your table and when you nearing the end of the page, you are creating a new page. Then you have operations that could be done only in one page and could span different pages, you implement pages as a doubly-linked list (each page remembers previous and next, so you can iterate beyound the end of the page. I still do not see applications for that though.

Zhelenskiy

05/28/2021, 11:39 AM

Do you expect some operations to be used only in one span? I mean how do you see table[110000000000000000][12] in your implementation?

4 Views

Open in Slack

Previous Next