One of the problems of many big int implementation...
# mathematics
z
One of the problems of many big int implementations is that they work extremely slow for not so big numbers.. I think it is a common case when number of really big numbers operations is slow so real numbers can be used with longs in most cases and overflows are rare. So, I created a demo to demonstrate it. I didn't compare with KMath's implementation but compared with java.lang.Math's one. Even when I give no helping info, it is about 4 times faster. When I used explicit
LongBased
type, it became 10 faster than generic one. The speed was the same with just manual long checking. This is because I used inline types. However, it is still a lot slower than just pure usage of
Int
s and `Long`s. But that may be achieved if the code is run under HotSpot and it decides to use intrinsic there. Here is the repo: https://github.com/zhelenskiy/BigInteger/tree/main.
I even started implementing normal array-based big ints there but got several internal backend errors and created issues so I am waiting for their resolution now.
i
The link is broken
z
@Iaroslav Postovalov Try again, please. I accidentally created the repo as private.
i
If your use case needs big integer which has to work fast both for small and large numbers, you can create your own wrapper type which checks if the number inside is in the limits of
long
and stores it as
long
. By extending
sealed interface
idea you can implement arithmetic functions as extensions to
sealed interface BigInteger
which are matching the number to
LongBasedBigInteger
or something else; however, there may be some issues with comparing the number and detecting overflows.
I also think that it is obvious that primitive
long
type is much faster than JDK
BigInteger
.
a
As I said earlier, any contributions are welcome. KMath allows several differen sets of operations on one type simultaniously, so it is is even possible to have differenr algrbras with the same structure representation. It would be nice to have bigint coded by an inline (JvmInline value) class over byte-array. It could help a lot with arrays.
z
Isn't IntArray better here? I think we can divide our big int into pieces of 4 bytes instead of 1: this is more effective. Furthermore, it is still not difficult to take overflowed part as we can convert to long which is of 8 byte. What do you think about it?
a
Does not matter, we use intArray now. But there is also a flag byte. It would be better to have a uniform array instead
z
What is the flag byte?
it is used for the sign
z
Why don't you use a enum?
a
It won't really help. But my point is that there should not be any additional sign object at all. I am not the one who wrote that part, it was contributed.
z
How do you expect that to be done?
I think that using int is an overkill
a
Probably use first int in the array for sign. We will loose several bytes in size, but save a lot more in pointers and memory indicection
z
Oh, I see your point. I agree with you.
If you store digits this way, you cannot guarantee that there are no trailing zeroes: 2^31 is too big for
IntArray
of size 1, but if I use the 0-th element as UInt, I have no overflow, so next Int will be 0, trailing zero.
The rule of no trailing zeroes may be better applied to all zeroes but last. Another way to do is to invent optimization of trailng zeroes: we often have some zeroes after some operations that became trailing. So now we need to recreate IntArray without them. I think we can store capacity which is not more
Int.MAX_VALUE
. Then we can store the flag with with capacity using bitmask.
I tried to implement such built-in signed numbers and to benchmark their summing speed but it is 4 times less than java's one (but still a lot better for small ones). This is not effect of using
LongBased
optimization because direct calls to the unoptimized code has the same low speed. However, it may be connected with that I do something wrong. I haven't published the version yet. Small numbers
Copy code
java.lang.Math.BigInteger:		2.18s		2.04s		1.84s		2.67s		1.86s
Kotlin (Generic Long based):	1.89s		1.41s		1.44s		1.81s		1.69s
Kotlin (Explicit Long based):	63.1ms		67.4ms		48.8ms		55.1ms		51.3ms
Kotlin (Generic Array based):	1.34s		1.52s		2.08s		1.59s		1.42s
Kotlin (Explicit Array based):	6.41s		5.98s		7.10s		6.58s		6.32s
Pure Int:						2.57ms		2.32ms		596us		7.31us		8.70us
Pure Long:						6.65ms		8.52ms		5.57ms		4.58ms		4.28ms
Checked Long:					37.3ms		33.4ms		85.0ms		77.7ms		72.1ms
For big numbers (kotlin array = direct call to array based):
Copy code
Kotlin: 14.7ms		Kotlin Array: 4.49ms		Java: 9.92ms
Kotlin: 23.1ms		Kotlin Array: 5.02ms		Java: 3.20ms
Kotlin: 1.87ms		Kotlin Array: 1.92ms		Java: 582us
Kotlin: 2.17ms		Kotlin Array: 2.06ms		Java: 544us
Kotlin: 5.56ms		Kotlin Array: 5.46ms		Java: 731us
Kotlin: 3.51ms		Kotlin Array: 3.28ms		Java: 778us
Kotlin: 2.88ms		Kotlin Array: 3.75ms		Java: 188us
Kotlin: 2.99ms		Kotlin Array: 2.73ms		Java: 416us
Kotlin: 5.40ms		Kotlin Array: 2.90ms		Java: 341us
Kotlin: 1.60ms		Kotlin Array: 1.55ms		Java: 340us
Should I publish the current version?
a
If you are talking about KMath, you can make a pull request with the code and we can discuss it there, maybe recommend how to imporve things. Of course, the new solution must be better than the one already there (not necessary better than Java BigInt).
z
@Iaroslav Postovalov I accidentally skipped your message. That is actually what is done. The multiplication is not only faster when I access LongBasedBigInt directly, but even when accessing from Generic BigInt.
@altavir The disadvantage of inlining sign into magnitude is that unary minus would become very heavy operation (we would need to copy all magnitude for it) and it would't be a good idea anymore to use it in plus operator or somewhere else. Received minor acceleration inlining IntArray doesn't worth that.
a
Array copy is actually very cheap. Memory indirection is much more expensive
z
Memory copy isn't cheap as magnitude size may be very big
I think that we can make UBigInt (that is semiring or maybe semifield). There are lots of cases (e.g. Cryptography) when numbers are only non-negative. In this cases there is no necessity and advantage to store the sign separately. So UIntArray should be definitely inlined.
Such unsigned bigint can be used inside usual bigint
However, there is still choice: to have one UInt for size (without leading zeroes) or to recreate array without leading zeroes.
a
I would prefer to have some kind of discussion issue about that with use-cases. I do not have any preferences about inner implementation right now, so any ideas and contributions are welcome.
z
So I want to replace
type alias Magnitude = UIntArray
with @JvmInline value class UBigInt(private val array: UIntArray) { private inline operator fun get(index: Int) = array[index] private inline operator fun set(index: Int, UInt) = .... }
In Github?
a
yes
z
(I don't expect getting answer soon there as #general looks a bit flooded)
As equality operator cannot be overridden, I find this idea to be wrong, @altavir 🥺
a
It is a good question. I will start a new thread then.
z
As a result I may notice that the approach should be freezed until overrideable equality operator appearance.
a
Yes. Please open issues if you find any inconsistencies. By the way, equality does not make a lot of sense anyway. Consider floating point for example.
z
What do you mean?
a
I meant that 5.0-1.0 is not necessary equal 4.0.
z
Yes, but 5.toBigInt is expected to be equal to 5.toBigInt
As it is not a double
a
Yes, I agree in that case, but it would be inconsistent if some mathematical objects could be compared for equality and some not.
👍 1
As I mentioned in the other thread, I do not have any clear decision on this point. When we were working with matrices,
equals
approch was proven as inconsistent for now. Partially because of inlines.
z
Maybe there could be some optimizations to fasten equality operator. I'll post a bit later
a
It is possible for immutable objects.
z
Now
a
But first we need to understand if there are use-cases for equality. I can't imagine a lot of them
z
Unfortunately, we cannot remove equality operator, so we need to choose between inconsistency and slowness. I choose the second
a
Well, actually we can. In a number of ways. The question if we want to.
message has been deleted
As an example how it could be done
z
Yes, of course. My idea is about optimizations: we can 5 times compare random digits firstly to fasten case
False
a
It does not make a lot of sense. For BigDecimal you can just compute the hashcode on creation since it is immutable.
But again, do we need it?
a
It is not about that. The question is do we use equality anywhere.
z
@altavir I think, that comparing integers seems to be a natural operation.
a
Maybe. But Use cases for BigInt are not the same for Int.
For example, you would never use BigInt as index
z
Currently, that is my case 😁
a
Could you please describe the use case in an issue? I can't imagine where it would be needed.
z
I want to do some mix of Excel and Jupiter Notebook. I also don't want to give restrictions for max index. But I wouldn't iterate one by one, of course
a
Using BigInt for indexes will immediately impact performance. The much better way is to use parttions instead, so each cell is codede by a partition and index. If for some reason you do not have enough cells (and it is impossible to imagine in notebook environment), you can automatically switch to a different partition,
z
But how to index partitions then?
a
Any way you want. If you take long*long, you will probably cover all the memory in the universe, but you can use any comparable ids for partitions, like strings since you won't be switching between them frequently.
z
I would not store all the used range as an array so I would use indexes as names in map
Or did you mean anything else?
@altavir
a
Yes, but it is better to include optional partition field in indexing and use default partition. But this problem asks for a new 🧵
z
But what to do if I iterate over the next partition? What should I do with indexes?
Maybe, I can have optimization for long-ranged BigInts and that would speed up (not much, unfortunately) evaluation.
a
You will have to use partition switch with performance impact like 1/Long.MAX_VALUE
z
How do you expect indexing there? I misunderstand your concept. Give an example, please.
a
I mean that you have to have pages in your table and when you nearing the end of the page, you are creating a new page. Then you have operations that could be done only in one page and could span different pages, you implement pages as a doubly-linked list (each page remembers previous and next, so you can iterate beyound the end of the page. I still do not see applications for that though.
z
Do you expect some operations to be used only in one span? I mean how do you see table[110000000000000000][12] in your implementation?