Following problem: I maintain a Wolfram Language plugin for IntelliJ and one of its tasks is to check if a symbol/variable is defined in the core of the language. I have information of about 50000 (yes, 50k) built-in symbols and for each symbol I have things like name, namespace, options, usage message, etc.. A symbol is uniquely defined by its full name consisting of name and namespace e.g. ``Developer`ToPackedArray`` or ``System`Plot`` where `` ` `` is the divider like
.
in Java/Kotlin and the last part is the name and the first part is the namespace (it's called context). Namespaces can be nested like in Java too, so there might be symbol thats in ``My`name`space`symbol``.
A simple datastructure for this is to have a hashset with entries
fullSymbolNames -> symbolInformation
but I have some operations that need to be blazingly fast because they are called very often
1. Does symbol
Plot
exist in the namespace ``System` ``? This can easily be done by prepending the namespace to the symbol-name and simply check if the keys contain such an entry
2. In which namespaces does the symbol
FooBar
exists? Not really fast as I have to filter the whole list of values or process all keys.
3. What symbols are in the namespace
PacletManager
? Again, I need to filter all values for this.
My current implementation in Java makes this possible by handling several different hashsets/lists but I wonder if anyone can think of a better data-structure to make the 3 operations fast. Note that once I loaded all the information about symbols, the data is read-only and I do not mutate any of it.