This package provides two capabilities that can be useful when running long computational experiments in IJulia notebooks:
-
It allows you to return all variables to a previous state (the past). This is useful if you run experiments in Julia cells that can take minutes or longer to complete, and want to re-examine the variables in a cell you have over-written.
-
It allows you to spawn a process to run on another thread while you keep writing other cells (the future). The process sandboxes the variables it uses, so it does not impact other cells. This is especially useful if you plan to run many similar experiments.
Now, you can just copy, paste and modify cells before running them.
More documentation can be found here.
using Pkg; Pkg.add("IJuliaTimeMachine")
Once you are running a Jupyter notebook, you can start the time machine by typing
import IJuliaTimeMachine
As the name of the package is rather long, and all of its commands require it as a prefix, I recommend renaming it like
TM = IJuliaTimeMachine
The rest of these docs assume you have renamed it to TM
.
If you use the Time Machine a lot, and don't want to type TM
, all the time, you can instead type
using IJuliaTimeMachine
This will export the function vars
and the macro @past
.
To check how many threads you have available, type
Threads.nthreads()
If you only have one then the @thread
macro will not work.
Most modern computers allow for more than one thread after some configuration.
To make sure that your Jupyter notebook starts with threads, and you are running Jupyter from a cell, you could type
export JULIA_NUM_THREADS=4
jupyter notebook
Or, on a Mac, put the following line in the file
~/.profile
or ~/.zshrc
:
export JULIA_NUM_THREADS=2
Of course, replace 2 or 4 with the number of threads you should have. Usually, this is twice the number of cores.
To find out how many this could be, you could start Julia with the -t auto
option, and then check how many threads it chooses to start with.
First, note that IJulia already provides some history functionality.
It maintains dictionaries In
and Out
that store the input (contents) and output (ans) of every cell.
To see the answer computed in cell 20, examine Out[20]
.
To go back to the state as it was after cell 20, at any time, type TM.@past 20
.
If you just want to look at a dictionary of the variables from cell 20, type TM.vars(20)
.
To stop saving state, type TM.saving!(false)
.
This is especially useful if IJuliaTimeMachine is causing errors. To start up again, type TM.saving!(true)
.
If you want to turn off IJuliaTimeMachine, run TM.unhook()
.
To prevent IJuliaTimeMachine from saving a variable x
, run TM.dontsave(x)
.
If you need to free up memory, type TM.clear_past()
to clear all the saved state information.
TM.clear_past(cells)
clears the states in the iterator (or range) given by cells
. It also clears all variables that are saved only in those states.
All of the saved data is kept in a structure that we internally call a Varchive
. It is stored at TM.VX
. If you want to save all variables so that you can recover them when restarting Jupyter, save this variable. For example, using
bson("vars from this notebook.bson", VX = TM.VX)
You can then load and access dictionaries of those variables using TM.vars(VX, n)
. Say, to get the variables from cell 10, you could type
VXold = BSON.load("vars from this notebook.bson")[:VX]
TM.vars(VXold, 10)
If picking variables out of that dictionary is too slow for you, you can emulate the @past
macro and put all the variables from the dictionary into Main by typing
TM.@dict_to_main(TM.vars(VXold,10))
Of course, you can use any dictionary in place of TM.vars(VXold,10)
.
You can run code in a thread by using TM.@thread
. It can be used at most once per cell.
Examples are like.
TM.@thread my_intensive_function(x)
TM.@thread begin
a number of computationally intense lines
end
TM.running
keeps track of cells that are running.
TM.finished
of course keeps track of those that stopped.
By default, notifications about finished cells are printed to the terminal from which Jupyter was started. You can turn this on or off with TM.notify_terminal!()
.
You can choose to have notifications printed to the current Jupyter cell by setting TM.notify_jupyter!(true)
.
You can find a demonstration of the time machine in action in the [examples
directory](https://github.com/](https://github.com/danspielman/IJuliaTimeMachine.jl/tree/master/examples).
It is saved as a Jupyter notebook, html, and pdf.
If IJuliaTimeMachine develops problems, it can cause strange errors to appear in every cell. The usual reason is some type of variable that it does not know how to handle. The easy solution is to prevent saving of that variable with dontsave
.
If that doesn't fix it, you will probably want to disable the Time Machine. The following command does this
TM.unhook()
Please help improve this. Someone who understands Julia Macros and internals could do a much better job of this. Feel free to file issues, create pull requests, or get in touch with daniel.spielman@yale.edu
if you can improve it.
Here are some things that would be worth doing:
-
Find a good way to save DataTypes. This is achievable for
@past
. But, it is trickier to make it work for@threads
. It would be good to use a consistent solution. -
Find a way to copy and save functions
-
Create a GUI to keep track of which spawned processes are running, and which have finished.
-
Think of what other features this needs.
-
Figure out a way to create tests for this package. The difficulty is that it needs to run inside Jupyter.
-
Output from @thread that is supposed to go to stdout winds up in whatever cell is current. It would be terrific to capture this instead, and ideally make it something we can play back later.
-
Sometimes we get an error that says
error in running finalizer: ErrorException("concurrency violation detected")
. Not sure why.
-
Time Machine only saves variables that are in
Main
.
It stores them inTM.VX
. The data structure is described invarchive.jl
. -
The Time Machine only saves variables that can be copied with deepcopy. In particular, it does not save functions. It would be nice to add a way to copy functions.
-
It keeps track of these variables by hashes (using
tm_hash
, which is more robust thanBase.hash
). So, if two variables store data that has the same hash, one of them will be lost. This is unlikely to be a problem for most notebooks, because a heuristic probabilistic, analysis of hashing suggests that the chance of a collision when there arev
variables is aroundv^2 / 2^64.
-
The state saving features work by using an IJulia
postexecute_hook
. This would not work for processes launched with@thread
because their postexecute hooks fire before the job finishes. So, those jobs finish by putting the data they should save into a queue. That data is then saved into VX during the preexecute phase of the next cell execution, using a preexecute hook. The queue is managed with a SpinLock so that two threads can not write to it at the same time.
The development of this package has been supported in part by a Simons Investigator Award to Daniel A. Spielman.