https://kotlinlang.org logo
Channels
100daysofcode
100daysofkotlin
100daysofkotlin-2021
advent-of-code
aem
ai
alexa
algeria
algolialibraries
amsterdam
android
android-architecture
android-databinding
android-studio
androidgithubprojects
androidthings
androidx
androidx-xprocessing
anime
anko
announcements
apollo-kotlin
appintro
arabic
argentina
arkenv
arksemdevteam
armenia
arrow
arrow-contributors
arrow-meta
ass
atlanta
atm17
atrium
austin
australia
austria
awesome-kotlin
ballast
bangladesh
barcelona
bayarea
bazel
beepiz-libraries
belgium
benchmarks
berlin
big-data
books
boston
brazil
brikk
budapest
build
build-tools
bulgaria
bydgoszcz
cambodia
canada
carrat
carrat-dev
carrat-feed
chicago
chile
china
chucker
cincinnati-user-group
cli
clikt
cloudfoundry
cn
cobalt
code-coverage
codeforces
codemash-precompiler
codereview
codingame
codingconventions
coimbatore
collaborations
colombia
colorado
communities
competitive-programming
competitivecoding
compiler
compose
compose-android
compose-desktop
compose-hiring
compose-ios
compose-mp
compose-ui-showcase
compose-wear
compose-web
confetti
connect-audit-events
corda
cork
coroutines
couchbase
coursera
croatia
cryptography
cscenter-course-2016
cucumber-bdd
cyprus
czech
dagger
data2viz
databinding
datascience
dckotlin
debugging
decompose
decouple
denmark
deprecated
detekt
detekt-hint
dev-core
dfw
docs-revamped
dokka
domain-driven-design
doodle
dsl
dublin
dutch
eap
eclipse
ecuador
edinburgh
education
effective-kotlin
effectivekotlin
emacs
embedded-kotlin
estatik
event21-community-content
events
exposed
failgood
fb-internal-demo
feed
firebase
flow
fluid-libraries
forkhandles
forum
fosdem
fp-in-kotlin
framework-elide
freenode
french
fritz2
fuchsia
functional
funktionale
gamedev
ge-kotlin
general-advice
georgia
geospatial
german-lang
getting-started
github-workflows-kt
glance
godot-kotlin
google-io
gradle
graphic
graphkool
graphql
graphql-kotlin
graviton-browser
greece
grpc
gsoc
gui
hackathons
hacktoberfest
hamburg
hamkrest
helios
helsinki
hexagon
hibernate
hikari-cp
hire-me
hiring
hongkong
hoplite
http4k
hungary
hyderabad
image-processing
india
indonesia
inkremental
intellij
intellij-plugins
intellij-tricks
internships
introduce-yourself
io
ios
iran
israel
istanbulcoders
italian
jackson-kotlin
jadx
japanese
jasync-sql
java-to-kotlin-refactoring
javadevelopers
javafx
javalin
javascript
jdbi
jhipster-kotlin
jobsworldwide
jpa
jshdq
juul-libraries
jvm-ir-backend-feedback
jxadapter
k2-early-adopters
kaal
kafka
kakao
kalasim
kapt
karachi
karg
karlsruhe
kash_shell
kaskade
kbuild
kdbc
kgen-doc-tools
kgraphql
kinta
klaxon
klock
kloudformation
kmdc
kmm-español
kmongo
knbt
knote
koalaql
koans
kobalt
kobweb
kodein
kodex
kohesive
koin
koin-dev
komapper
kondor-json
kong
kontent
kontributors
korau
korean
korge
korim
korio
korlibs
korte
kotest
kotest-contributors
kotless
kotlick
kotlin-asia
kotlin-beam
kotlin-by-example
kotlin-csv
kotlin-data-storage
kotlin-foundation
kotlin-fuel
kotlin-in-action
kotlin-inject
kotlin-latam
kotlin-logging
kotlin-multiplatform-contest
kotlin-mumbai
kotlin-native
kotlin-pakistan
kotlin-plugin
kotlin-pune
kotlin-roadmap
kotlin-samples
kotlin-sap
kotlin-serbia
kotlin-spark
kotlin-szeged
kotlin-website
kotlinacademy
kotlinbot
kotlinconf
kotlindl
kotlinforbeginners
kotlingforbeginners
kotlinlondon
kotlinmad
kotlinprogrammers
kotlinsu
kotlintest
kotlintest-devs
kotlintlv
kotlinultimatechallenge
kotlinx-datetime
kotlinx-files
kotlinx-html
kotrix
kotson
kovenant
kprompt
kraph
krawler
kroto-plus
ksp
ktcc
ktfmt
ktlint
ktor
ktp
kubed
kug-leads
kug-torino
kvision
kweb
lambdaworld_cadiz
lanark
language-evolution
language-proposals
latvia
leakcanary
leedskotlinusergroup
lets-have-fun
libgdx
libkgd
library-development
lincheck
linkeddata
lithuania
london
losangeles
lottie
love
lychee
macedonia
machinelearningbawas
madrid
malaysia
mathematics
meetkotlin
memes
meta
metro-detroit
mexico
miami
micronaut
minnesota
minutest
mirror
mockk
moko
moldova
monsterpuzzle
montreal
moonbean
morocco
motionlayout
mpapt
mu
multiplatform
mumbai
munich
mvikotlin
mvrx
myndocs-oauth2-server
naming
navigation-architecture-component
nepal
new-mexico
new-zealand
newname
nigeria
nodejs
norway
npm-publish
nyc
oceania
ohio-kotlin-users
oldenburg
oolong
opensource
orbit-mvi
osgi
otpisani
package-search
pakistan
panamá
pattern-matching
pbandk
pdx
peru
philippines
phoenix
pinoy
pocketgitclient
polish
popkorn
portugal
practical-functional-programming
proguard
prozis-android-backup
pyhsikal
python
python-contributors
quasar
random
re
react
reaktive
realm
realworldkotlin
reductor
reduks
redux
redux-kotlin
refactoring-to-kotlin
reflect
refreshversions
reports
result
rethink
revolver
rhein-main
rocksdb
romania
room
rpi-pico
rsocket
russian
russian_feed
russian-kotlinasfirst
rx
rxjava
san-diego
science
scotland
scrcast
scrimage
script
scripting
seattle
serialization
server
sg-user-group
singapore
skia-wasm-interop-temp
skrape-it
slovak
snake
sofl-user-group
southafrica
spacemacs
spain
spanish
speaking
spek
spin
splitties
spotify-mobius
spring
spring-security
squarelibraries
stackoverflow
stacks
stayhungrystayfoolish
stdlib
stlouis
strife-discord-lib
strikt
students
stuttgart
sudan
swagger-gradle-codegen
swarm
sweden
swing
swiss-user-group
switzerland
talking-kotlin
tallinn
tampa
teamcity
tegal
tempe
tensorflow
terminal
test
testing
testtestest
texas
tgbotapi
thailand
tornadofx
touchlab-tools
training
tricity-kotlin-user-group
trójmiasto
truth
tunisia
turkey
turkiye
twitter-feed
uae
udacityindia
uk
ukrainian
uniflow
unkonf
uruguay
utah
uuid
vancouver
vankotlin
vertx
videos
vienna
vietnam
vim
vkug
vuejs
web-mpp
webassembly
webrtc
wimix_sentry
wwdc
zircon
Powered by
Title
m

MrPowerGamerBR

06/16/2019, 8:51 PM
I think I found a bug in coroutines... but I'm not sure if it is really a bug or a mistake I made: After the JVM is up for +- 3 days,
delay(...)
stops working, no matter what dispatcher I use (
Dispatchers.Default
,
<http://Dispatchers.IO|Dispatchers.IO>
, custom dispatcher from a custom executor, etc). It just hangs on the
delay
call and never continues. The dispatcher/thread pool isn't overloaded, because the application is still alive and processing every request without any delay... except when there is a
delay
call somewhere, then the code hangs) Restarting the JVM magically fixes the issue, until after a few days it stops working again. Using JVM 12, Kotlin 1.3.21 + Coroutines 1.2.1 (Yes, I know, it is outdated, I didn't update yet because I couldn't get Kotlin's script engine working on 1.3.31... if reeeally needed I can update, but I decided to ask here before I do it to figure out if anyone knows what could cause this issue (or if it was already fixed))
😱 3
🍿 2
Here's a GIF showing the issue, the code is evaluated, executed buuut it gets stuck on the
delay(...)
, no matter what dispatcher I use.

https://cdn.discordapp.com/attachments/393332226881880074/589923294916902961/SayRQ1OGN5.gif

Of course, maybe it is a issue on my end... but I can't figure out what would cause this issue because restarting the application fixes the issue (until it breaks after a few days...). However sometimes (very rarely) the tasks works fine, they execute, are delayed and then they exit... but sometimes they get stuck.
What looks that it happens is that after a veeery long time (way more than the specified delay), the coroutine does continue. Here I executed the code:

https://cdn.discordapp.com/attachments/392476688614948897/589958622000185549/unknown.png

And after almost one hour later... the code was executed.

https://cdn.discordapp.com/attachments/392476688614948897/589958706410553356/unknown.png

Very strange, as I said it isn't that there isn't available threads to process the requests, because launching tasks works, only
delay
seems to have that strange issue. (even suspending functions works fine, except if they call
delay
)
😱 1
e

elizarov

06/16/2019, 11:34 PM
Something is bogging down the thread the does
delay
schedule. It is called “Default Executor”.
What’s your execution infrastructure? Can you take a thread-dump to see what that thread is busy with?
m

MrPowerGamerBR

06/17/2019, 12:52 AM
Yeah sure, let me upload it to gist
I'm using a cached thread pool for the coroutines:
val coroutineExecutor = createThreadPool("Coroutine Executor Thread %d")
	val coroutineDispatcher = coroutineExecutor.asCoroutineDispatcher()

	fun createThreadPool(name: String): ExecutorService {
		return Executors.newCachedThreadPool(ThreadFactoryBuilder().setNameFormat(name).build())
	}
e

elizarov

06/17/2019, 12:54 AM
That should not affect
delay
in the way you describe.
I don’t see anything unusual in the thread-dump, though. The “kotlinx.coroutines.DefaultExecutor” does not seem to be busy with anything — parked as it is supposed to be
m

MrPowerGamerBR

06/17/2019, 12:55 AM
Strange that everything works (calling
suspend
methods, etc), but not delay
I can run the Kotlin Coroutines debug JVM agent, however I don't want to reboot the JVM if possible (if I reboot, the issue goes away until it randomly goes back)
e

elizarov

06/17/2019, 12:58 AM
Any chance you can attach a debugger this running JVM?
One theory I have is that somehow your system becomes slow at thread creation…. Because after 1 sec when you don’t use
delay
its thread is automatically shutsdown, so next time you need to delay it needs to create thread……. But at this thread dump I see the thread up & running
m

MrPowerGamerBR

06/17/2019, 1:03 AM
Yes, but that would require a JVM restart (and that would "fix" the issue until it breaks again, which looks like it takes about three/four days for some reason)
e

elizarov

06/17/2019, 1:03 AM
Can you get a memory dump with
jmap
?
(it works on a running process)
jmap -dump:live,format=b,file=heap.bin <pid>
m

MrPowerGamerBR

06/17/2019, 1:04 AM
Yeah, the only issue is the size of the memory dump, I don't think I would be able to open it because the app has 36GBs memory allocated, but I could try
e

elizarov

06/17/2019, 1:06 AM
Hm… I will not be able to open it…
m

MrPowerGamerBR

06/17/2019, 1:07 AM
I can try reproducing on a smaller instance (this happened before on one of my test instances) but the issue is waiting until the problem happens again 😛
e

elizarov

06/17/2019, 1:08 AM
Is the process remote or local? If it is local, then you can attach debugger from IDEA without restart
m

MrPowerGamerBR

06/17/2019, 1:11 AM
remote
e

elizarov

06/17/2019, 1:12 AM
Bad luck. I have an interesting theory, though.
Do you control all the code that uses coroutine? Do you know all the uses of
delay
in your code?
m

MrPowerGamerBR

06/17/2019, 1:17 AM
Yes
(Well, I think I do, I don't think I do use any other external dependency that uses
delay
)
e

elizarov

06/17/2019, 1:17 AM
Do you use something like
delay(Long.MAX_VALUE)
to wait forever?
m

MrPowerGamerBR

06/17/2019, 1:24 AM
No, I don't... but what about "very big values that aren't `Long.MAX_VALUE`"? (Because if a user says "hey, wait 100 years because idk" and the code waits for... well,
now - 100 years in milliseconds
)
e

elizarov

06/17/2019, 1:27 AM
Neither should be really a problem… I’m still puzzled how to explain it… Especially this “1 hour later wakeup”
m

MrPowerGamerBR

06/17/2019, 1:31 AM
Also, it isn't exactly 1 hour, it looks like it varies
And what is interesting is that the tasks doesn't follow a "order" (This task was delayed first, so it will be executed first), it seems to be random
e

elizarov

06/17/2019, 1:32 AM
That is extremely puzzling. It basically means that something got broken inside the delay’s binary heap data-structure, but I cannot figure out how this could happen — it is pretty robust data structure, simply synchronized and well covered with tests.
m

MrPowerGamerBR

06/17/2019, 1:33 AM
e

elizarov

06/17/2019, 1:34 AM
What could break is a completely broken implementation of
System.nanoTime()
(and jumps back and forth and something like that), but that kind of broken nano-time would wreak a lot of havoc elsewhere. I doubt a running system can have a problem with it.
What I did notice is that it happens after the JVM was up for more than 3 days
e

elizarov

06/17/2019, 1:38 AM
I cannot grasp a significance of “3 days” in this context….
In your last log. That “Coroutine Executor Thread”. What are those? I don’t see them in your thread dump
Also scary big numbers… Like “Coroutine Executor Thread 149779”. Are they sequentially numbered? Looks like quite a big thread churn…
From that log it follows that a new thread was created approx every 2 seconds. That’s not by itself a big number (would not bog down a system) unless something else is going on…
m

MrPowerGamerBR

06/17/2019, 1:42 AM
Yes, they are sequentially numbered
e

elizarov

06/17/2019, 1:43 AM
Why don’t I see them in the thread-dump? What code creates them? Maybe it’s not the
delay
after all, but those threads were not available (missing) at the time?
m

MrPowerGamerBR

06/17/2019, 1:48 AM
This is the code that creates them:
Executors.newCachedThreadPool(ThreadFactoryBuilder().setNameFormat(name).build())
Only Kotlin Coroutines uses that executor, so it can't be something else. Also, wouldn't GC remove cached thread (that are unused) after 60s, causing them to create new threads (which will have a new ID assigined to them)? And if it was the dispatcher, wouldn't using a different dispatcher work? (like
Dispatchers.Default
or
<http://Dispatchers.IO|Dispatchers.IO>
)
e

elizarov

06/17/2019, 3:36 AM
Can it be some kind of contention in somewhere else? Can you try with unconfined dispatcher and with direct printing to console:
GlobalScope.launch(Dispatchers.Unconfined) { 
    println("(0) at ${Instant.now()}")
    delay(1)
    println("(1) at ${Instant.now()}")
}
Do you have console output of that process redirected somewhere?
m

MrPowerGamerBR

06/17/2019, 10:03 AM
I will try doing that asap, while I don't redirect the console output (except when logging with logback) I can see the console output.
Tested it right now and yeah, it doesn't work too, it displays the first
(0)
line but never displays the
(1)
Because the coroutines are still stuck (and that breaks some of the features on my application) I will restart the JVM and update all the dependencies (Kotlin, Kotlin Coroutines, etc) + enable coroutine debug + JVM debug (to remote attach to it)
e

elizarov

06/17/2019, 3:25 PM
It means that indeed something somehow got broken inside the DefaultExecutor’s queue that
delay
uses.
m

MrPowerGamerBR

06/17/2019, 3:40 PM
I hope updating everything + enabling debug will help trying to track down what is causing the issue 🙂
Something strange that I noticed: It looks like that when the tasks are eventually executed, a lot of them are executed at once (even if they weren't delayed at the same time)
I restarted the JVM and did the following changes: 1. Updated Kotlin to 1.3.31 2. Updated Kotlin Coroutines to 1.3.0-M1 3. Updated all dependencies (well, that doesn't even matter but hey) 4. Increased the thread pool cache to keep for 5 minutes (from 60s), set the core pool size to 10 (from 0) 5. Added
-Dkotlinx.coroutines.debug
to the JVM arguments 6. Added a task that keeps creating coroutines, waits 60s, then prints how much time it took and logs it. 7. Added a way to dump currently running coroutines (with the DebugProbe) 8. Enabled remote debugging I know some of the changes doesn't have any relation to the issue, but I was like "let's try changing everything and see what sticks"
e

elizarov

06/17/2019, 11:04 PM
Thanks. Let us see how it goes. Ping me if it reproduces. In debugger I’ll be interested to examine the values of the fields of
DefaultExecutor
object instance.
m

MrPowerGamerBR

06/18/2019, 12:34 AM
I ended up taking out the debug flags, it was slowing everything down and causing issues on my production instance 😞 I will keep it enabled on my test instance (which I hope I can reproduce it there)
@elizarov yeah looks like the uptime doesn't have anything to do with it, my production instance is now borked again while the development instance works fine. I will try enabling debug on the production and see if it doesn't have a too big performance impact (maybe it was the coroutines' debug stuff that has a big performance impact? I didn't test with only one of them disabled because after almost one hour trying to get the production instance to work without having 500%+ CPU usage was hard) But before I restart, is there's any way to get the values of the fields of the DefaultExecutor object instance with reflection? Because if it is, wouldn't it worth a shot to try getting it via reflection (since I can evaluate code without restarting) without enabling debug? Probably it is impossible, but just asking before I restart (and waiting until it decides to break again)
e

elizarov

06/19/2019, 9:01 PM
Yes. You can use “getDeclaredField” and “setAccesible(true)” to get hold on the private fields via reflection
m

MrPowerGamerBR

06/19/2019, 11:31 PM
Oh okay, that would be helpful, can you share what fields should be get from the DefaultExecutor object that would be helpful to debug the issue + how can I get a reference to the DefaultExecutor object? I tried searching around the internet and documentation about it but I couldn't find anything about it
Also, this is just an theory but... if there was a lot (I mean, 100k+ of them) of suspended coroutines, would that affect the delay? Maybe something is broken on my code and is creating a lot of coroutines that stay on the suspended state
e

elizarov

06/20/2019, 3:42 AM
Suspended where? In
delay()
? In theory it should not be a problem, since
delay
queue uses O(log n) algorithm.
m

MrPowerGamerBR

06/20/2019, 1:26 PM
@elizarov I'm trying to dump the variables from the
DefaultExecutor
, while most of them I was able to, getting the
_queue
and
_delayed
values from the
EventLoopImplBase
(which I guess are the ones that you will need) is kinda difficult, they don't show up on the
declaredMembers
list, only on the
declaredFields
list, which then returns a
AtomicReferenceFieldUpdater
on the
_queue$FU
field... but what are the variable's types? In Kotlin it is a
atomic<Any?>(null)
with AtomicFu, but what would that be in Java itself?
e

elizarov

06/20/2019, 1:29 PM
This type is
Object
in Java. Atomics are erased during compilation.
m

MrPowerGamerBR

06/20/2019, 5:08 PM
Yeah, but doesn't the
AtomicReferenceFieldUpdater
has two types?
<*, *>
... well, actually I don't need to know the two types because you can just cast it with
<*, *>
, however I wanted to know what value should be used on the
_queue.get(SomeObjectHere)
. I tried looking at the Kotlin Coroutines code but because it uses AtomicFu, it is just a
_queue.value
call
e

elizarov

06/20/2019, 7:30 PM
You don’t need field updater. What you see in code as
_queue.value
is actually stored in
_queue
field.
m

MrPowerGamerBR

06/20/2019, 8:54 PM
ooooh, sorry for my mistake! I was running my test code on a test project and there the
_delayed
and
_queue
fields were null, but that's because I never spawned any coroutine task. Anyway, here's some of the dumped variables from the
EventLoopImplBase
EventLoopImplBase:
_queue = kotlinx.coroutines.internal.LockFreeTaskQueueCore@44170e37
Size: 0
_queue$FU = java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl@671fab50
_delayed = kotlinx.coroutines.internal.ThreadSafeHeap@3d30a5ae
Size: 3300
_delayed$FU = java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl@3ad4a630
isCompleted = false
The only thing that seems... "off" is the size of the
_delayed
ThreadSafeHeap, but maybe that's a normal size if you are running a big application (my test instance returns
9
for that same var)
e

elizarov

06/20/2019, 9:05 PM
Can you do me a favor. There’s an array
a
inside
ThreadSafeHeap
instance. It contains instance of
DelayedTask
. Can you dump all of them? I’d need
index
and
nanoTime
for each one to see if it is consistent
m

MrPowerGamerBR

06/20/2019, 10:05 PM
Sure! I will do that asap
Aside from some tasks that seem to have negative
nanoTime
, there doesn't seem to be anything wrong
@elizarov after adding some "sanity" checks to a few places where coroutines are created (A function was constantly creating coroutines that were shut down right after they were created... moved the "should I create the coroutine?" check to before the coroutine was created. Added a check to avoid creating tasks that are going to be delayed for >6 months) I cannot reproduce the bug anymore. I'm still trying to debug the issue, but the JVM is currently almost 6 days up and running and the coroutines are still working fine and well 🙂
e

elizarov

07/01/2019, 4:53 PM
Thanks for additional info.
Something is clearly wrong with the heap structure you’ve dumped. It is broken at that explains all the strange effects you saw. However, I cannot yet figure out how this could have happened. The only plausible explanation is that
System.nanoTime()
is not monothonic at your system.
You were using
delay(Long.MAX_VALUE)
, weren’t you?
(What i see in your data could result from it plus slightly backwards-going nanoTime)
If so, i know how to fix it.