Posts

Adventures with Javascript Graphing Libraries

I was looking for a Javascript library that would display a graph – in the mathematical sense of a node-arc network such as the following:

I might require curved lines and maybe arrows, and I need it to be zoomable and pannable.  I also need node labels and click events on both nodes and edges.

Remembering a few links someone sent around showing off impressive applications of libraries such as d3.js and jQuery Sparklines I had high hopes of finding something I could use that was polished and snazzy with superb documentation and an enthusiastic community.

The first thing I noticed is that it seems like everyone is in love with Force-Directed Layouts these days – you know those graphs where the nodes appear “springy” and the spatial layout is “discovered” by the physics algorithm, such as this example.  By their nature, however, these graphs have a dynamic layout – you generally can’t specify exact positions for your nodes.

The libraries I looked at included the following.  A discussion of SVG vs Canvas is here.

d3.js – this library is SVG-based (as opposed to say, HTML5 canvas-based).  With a cursory look I couldn’t see how to zoom (you should be able to though, right – it’s scalable vector graphics…!), and the docs are pretty large and initially confusing.  The consensus seems to be that it is a power users tool.

Sigma.js has some nice-looking examples and appears to scale to hundreds of nodes and arcs nicely. It can use GEXF graph format. However, it lacks good documentation and doesn’t seem to have much momentum – the last source code change was 9 months ago.

jsPlumb has demos that do not appear to be zoomable, though the library seems to be still actively developed and has good docs.  It can render by canvas/SVG/VML using jQuery/MooTools etc.  However it seems more aimed at connecting elements with drag and drop than showing graphs per se.

Raphael seems nice enough, is SVG based, has intriguing demos and reasonable docs, the last change on github was 10 months ago, but again the demos had no zoom.

The Javascript InfoVis Toolkit strikes the right balance of functionality with simplicity for me – there’s enough simplicity that I can actually delve into the source (only one file needed – jit.js) and pretty easily see what’s going on.  It’s canvas-based, but appears to mousewheel zoom (very smoothly) using canvas’ scale function.  The project was also involved in a Google Summer of Code.  Although there was no static graph example, I found it quite easy to adapt the force-directed example code to show a nice zoomable graph with node coordinates that I explictly specified (and has node and edge click events).  (In doing this, I also discovered something interesting – the force-directed layout example gives you a totally new layout every time you refresh – you can see this by refreshing the demo page here multiple times.  This randomised behaviour seems somewhat less than useful. Also interesting is the drag and drop of the nodes in the force directed example).

So I did not find anything that fit squarely with my requirements, though I’m going with the Infovis Toolit for now.  (Of course there is always the possibility to handcode it myself, which for reasonably simple requirements is always an option, though less preferred particularly as you might encounter issues such as cross-browser bugs that have been addressed in the libraries and frameworks.)

On the plus side I truly learnt what Bezier curves actually are (and how they differ from quadratic Bezier curves), thanks to some help from the Wikipedia page and this neat interactive page where you can drag the control points around to see the effect on the curve.

Some time after doing the above (admittedly rather cursory) research, I then came across another cluster of libraries with a slightly different focus.  These libraries are more geared towards interactive elements on canvas (e.g. drag and drop) and persisting the state of the elements when they are changed – a good starting discussion is at the Stack Overflow page here.  The “master list” of these libraries I found in a Google doc here.  These libraries provide more of a “scene graph” implementation which would be useful if you need a framework for proper tracking of the elements being drawn (especially if they are to be animated, e.g. a particle simulator).

A build and test script in Python

I’ve recently created a script in Python for continuous get-build-test.

It pulls the latest code from a Mercurial repository, builds a bunch of C++ projects, then runs pytest.

The script demonstrates a few simple, but tasty Python techniques including:
– parsing command line flags using optparse
– using subprocess to run shell commands from Python, and capturing the output as it runs
– archiving the build and test results to a log file
– scraping the last line of output of hg pull, py.test etc as a simple (albeit fragile) way to detect success / failure

I’ve set up a cron job to run this every hour. It only actually does anything if there is changed code from the hg pull.

The cron job is set up with crontab -e and the file looks like:

SHELL=/bin/bash
PATH=/usr/bin:/bin:/usr/local/bin
0 * * * * cd /vol/automatic_build_area && python pull_code_and_build_and_test.py

The path /usr/local/bin had to be added as py.test would not run without it (the path was discovered with the useful “which” command, as in “which py.test”). Furthermore, pytest seemed to need to be run with < /dev/null. (I have noticed that, despite its general awesomeness, pytest does have some strange quirks when it comes to general environment issues – the above for example, plus treatment of global variables).

Here is the script:

from optparse import OptionParser
import subprocess
importdatetime

brief_output = False
all_lines = []

def runProcess(cmd):
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
while p.poll() is None:
if not p.stdout.closed:
data = p.communicate()[0]
if data is not None:
for line in data.split(“\n”):
yield line

def run_shell_cmd(cmd, force_brief_output = False):
if type(cmd) is str:
cmd = [cmd]

lines = [“Running: ” + ” “.join(cmd) + “\n”]
print “”.join(lines)
for line in runProcess(cmd):
if not brief_output and not force_brief_output:
print line.replace(“\n”, “”)
lines.append(line + “\n”)

while not lines[-1] or lines[-1] == “\n”:  # pop off trailing empty lines
lines.pop()

if not force_brief_output:
all_lines.extend(lines)
return lines

def pull_build_and_test(build_only, test_only):
if build_only and not test_only:
print “Build only – not pulling or testing”
if not build_only and test_only:
print “Test only – not pulling or building”
if build_only and test_only:
print”Build and test only – not pulling”

if not build_only and not test_only:
pull_output = run_shell_cmd(“hg pull -u”)

if not build_only and not test_only and (not pull_output or pull_output[-1] == “no changes found\n”):
print “No changes to repo from hg pull”
else:
if not not build_only and test_only:
make_clean_output =  run_shell_cmd(“make clean”)
make_output = run_shell_cmd(“make”)

if not make_output or make_output[-1] != “– Success –\n”:
print “Build failure!”
# send an email, for example: “Failure at ” + datetime.datetime.now().strftime(“%d %b %Y %H:%M”) \
+ ” – Build failure” with body “”.join(pull_output + [“\n\n”] + make_output))
return False
print “Build success!  C++ engines all built”

if not build_only and not test_only:
pytest_output = run_shell_cmd(“py.test -v < /dev/null”)

if not pytest_output or not “======================” in pytest_output[-1] or not ” passed” in pytest_output[-1] \
or ” failed” in pytest_output[-1] or ” error” in pytest_output[-1]:
print “Pytest failure!”
# send an email, for example: “Failure at ” + datetime.datetime.now().strftime(“%d %b %Y %H:%M”) \
+ ” – Pytest failure” with body “”.join(pull_output + [“\n\n”] + pytest_output))
return False
print “Test success!  All tests have passed”
returnTrue

if __name__ == “__main__”:
all_lines = [“\n\n\n—–**—- Automatic build and test ” + datetime.datetime.now().strftime(“%d %b %Y %H:%M”) + “\n\n” ]
parser = OptionParser()
parser.add_option(“-b”, “–build_only”, dest=”build_only”, action=”store_true”, default=False)
parser.add_option(“-t”, “–test_only”, dest=”test_only”, action=”store_true”, default=False)
parser.add_option(“-l”, “–less_output”, dest=”less_output”, action=”store_true”, default=False)
(options, args) = parser.parse_args()
brief_output = options.less_output
success = pull_build_and_test(options.build_only, options.test_only)
all_lines.append(“\n\n——————– Automatic build and test summary: success = ” + str(success) + \
” ——- Finished running ” + datetime.datetime.now().strftime(“%d %b %Y %H:%M”) + ” ————-\n\n”)
open(“automatic_build_and_test.log”, “a”).write(“”.join(all_lines))    # append results to the log file

 

Acknowledgments to this Stack Overflow solution for pointers on how to capture subprocess output as it’s running, although the above function is much more robust (doesn’t seem to fail from timing problems when there is multiline output etc).

Function within a Function in C++

I have been coding in C++ for a number of years. Only recently did I find out that there is an indirect way to write a function within a function. I have found myself using this little workaround more and more recently. When coupled with the use of a static variable within a function, the ability to have “Everything In Its Right Place” i.e. avoid unnecessary functions and variables in the global namespace, is particularly appealing to my inner code nazi.

The below code snippet shows an example…

Note

– I have only tested the code snippet below for compilation using Visual Studio
– Visual Studio’s debugger does not allow you to “watch” spndI when execution is inside the function NodeIsOfSameTypeAsIButNotI

– I have formatted the code for the narrow width afforded this blog! So the example looks more verbose than it would otherwise be, and the blogger may mangle some of the code when it is posted (causing me to re edit the post over and over until google gets it right)

struct CNode
{
// shell struct to enable code compilation
double GetLongitude() const {return 0;}
double GetLatitude() const {return 0;}
double GetType() const {return 0;}
};
template class CSomeClassHoldingNodesForFastFinding
{
public:
typedef bool(*NodeCanBeClosest)(const T*);
template inline T* FindClosest
(
double dblLongitude,
double dblLatitude,
NodeCanBeClosest pNodeCanBeClosest
)
{
return NULL; // Really return a T* not Null
}
};
void FindClosestNodePairsAndDoSomethingWithThem
(
vector& rgpnd,
CSomeClassHoldingNodesForFastFinding* pFastFindNodes
)
{
static CNode* spndI = NULL;
for (size_t nNodeI = 0; nNodeI != rgpnd.size(); nNodeI++)
{
spndI = rgpnd[nNodeI];
struct Func
{
static bool NodeIsOfSameTypeAsIButNotI(const CNode* pnd)
{
return
pnd != spndI
&&
pnd->GetType() == spndI->GetType();
}
};
CNode* pndJ = pFastFindNodes->FindClosest
(
spndI->GetLongitude(),
spndI->GetLatitude(),
&Func::NodeIsOfSameTypeAsIButNotI
);
// .. Do something with spndI and pndJ
}
}

Using Simpy and Python

I recently completed a discrete-event simulation model using SimPy. This was my first foray into Python programming and the first time I used a non-graphical discrete-event simulation package (most of my previous experience was using Witness).

The model tracks the utilisation of wagons on trains. Key constraints include availability of staff, availability of a path through the network for movement of trains and wagons, and availability of a spare locomotive to make ancillary wagon movements. Maintenance activities are simulated as are wagon faults and repairs.

The rail network is represented using a .dot file; our brilliant computer scientist, Loki, devised a mechanism to find a path through the network using the pygraph minmax module (pydot is used to parse the .dot file).

Although the learning curve was quite steep for me, I found that I really enjoy working with both Python and SimPy. It was liberating not fiddling with graphics which must be placed precisely on the screen or manipulated to get element iteractions to work perfectly. The use of a .dot file to represent the rail network means that it can be changed at will without requiring a vast amount of re-work to fix graphics (the initial network is a high-level representation of the actual network which will be refined in the future). As an added bonus, Graphviz can be used to view and modify the .dot files. SimPy doesn’t come loaded with pre-configured entities (like machines, staff, tracks, forklifts, etc.). However, you can create any entity you desire and define it’s behaviour using the Python programming language. Since SimPy is open source all code is freely available and can be modified to provide further functionality. This provides far greater flexibility than I’ve experienced with other packages since I was not limited by behaviours defined by the package developers and the use of a package-specific macro language. As with other simulation packages I’ve used, I created variables to track important information and then created both .txt and .csv output files on which extensive analysis can be performed.

While the cost of using Python and SimPy was a significant learning curve (for me) I feel that it was definitely worth it. We’ll be looking to use SimPy on future projects!

Special thanks to Klaus Muller for his patience in answering my questions and assistance!

The Wonders of Python

I recently wrote a little code generator in Python that takes in a schema file in XML format and expands out specially marked up tags inside C++ code. It was my first real industrial strength use of Python and (for this C++ veteran at least) I was amazed at how much I could accomplish in just 300 lines of code. In particular I liked:

  • Not having to compile!
  • List comprehensions
  • String slicing and dicing
  • Returning multiple arguments from a function
  • Some really nice constructs:
return “True” if (self.Type not in self.VariableTypes) else “False”
This one sure does flow like natural language.

Dynamic typing sometimes feels like a free for all, but I’ll gladly pay that price for the many powerful features it enables. I sure am writing “self” a lot though 🙂

I think the real epiphany moment was realising that there is essentially no real complexity inherent in the language – this is very liberating compared to C++ where you have to constantly steer clear of C++’s many dark corners (about which entire books have been written, e.g. C++ Gotchas by Stephen Dewhurst). Programming really does become less of a struggle.

A somewhat tangential link musing over Python vs C++ in the context of unit testing is this blog post, which describes approaching building a Sudoku solver. Those interested in the Sudoku-solving ability of Python should also be sure to check out this impressive presentation on AI, puzzles, and the use of ‘itertools’ in Python.

Managing Database Connections “with”

I’m using a pool to manage database connections. I decided to change from having per thread database connections to have per function database connections.

I was unsure of how to do this elegantly and thought decorating the the functions might be the best approach. However I really wanted to have local variables defined: enter context managers and the with statement.


@contextmanager
def db_wrap():
conn = pool.getconn()
dict_cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur = conn.cursor()
try:
yield conn, cur, dict_cur
finally:
pool.putconn(conn)

and to use it:

with db_wrap() as (conn, cur, dict_cur):
cur.execute('select stuff etc')
conn.commit()

Rather elegant 🙂

Official docs: http://docs.python.org/library/contextlib.html

Loki