Sunday, October 11, 2009

Finished

It should be obvious to any following this blog that 10 weeks of this student-ship project have long since ended, however until now there were a few outstanding issues. I can now finally say that the project is finished and ready for public use. It can be found at http://epdt77.ph.bham.ac.uk:8080/webgrid/, although the link may change at some point in the future & the "add to iGoogle" buttons won't work for now.

The gadget is currently configured to utilize all available data from 2009. Specifically it holds data for all jobs submitted in 2009, up-to around mid September (the most up to date data available from the Grid Observatory).

In addition to this I have produced a small report giving an overview of the project which is available here.

Friday, July 31, 2009

Another Update (Day 30)

As week six comes to a close I thought it was about time for another progress update. So same as last time, stealing the bullet points from 2 posts back, with new additions in italics.

  • GridLoad style stacked charts.

  • A "League Table" (totaled over the time period).

  • Pie charts (of the "League Table").

  • Filters and/or sub filters (just successful jobs, just jobs by one VO, just jobs for this CE etc).

  • A tabbed interface.

  • Regular Expression based filtering

  • Variable Y-axis parameter (jobs submitted, jobs started, jobs finished etc).

  • Transition to university web servers.

  • Move to a more dynamic chart legend style. Not done for pie chart yet.

  • Ensure w3 standards compliance/cross-browser compatibility & testing.

  • Automate back end data-source (currently using a small sample data set). Need automatic renewal of grid certificates.

  • Variable X-axis time-step granularity.

  • Data/image export option.

  • A list of minor UI improvements. (About 10 little jobs, not worth including in this list as they would be meaningless, without going into a lot more detail about how the gadget's code is implemented).

  • Optimise database queries and general efficiency testing.

  • Make the interface more friendly. (Tool-tips etc.)

  • Possible inclusion of more "real time" data.

  • Gadget documentation.

  • A Simple project webpage.

  • A JSON data-source API reference page.

  • 2nd gadget, to show all know infon for a given JobID.

  • 2nd gadget: Add view of all JobIDs for one user (DN string).


  • The items in this list are now approximately in the order I intend to approach them.

    On another note, I have finally managed to get some decent syntax highlighting for google gadgets, thanks to this blog post even if it means being stuck with VIM. To get this to work add the vim modeline to the very bottom of the xml gadget file, other wise it tends to break things, such as the gadget title, if added at the top. Whilst VIM is not my editor/IDE of choice it's pretty usable and can, with some configuration, match most of the key features (show whitespaces), I use in Geany. However Geany's folding feature would save a lot of time & scrolling.

    Friday, July 17, 2009

    Progress Update

    Yesterday a gave a short presentation on the progress of this project. The slides for this presentation are available for OpenOffice as OPD or as PDF.
    The presentation included a summary of progress on the gadget outlined in the previous blog post. Using the bullet points from the previous post, the current to-do list now looks something like this (points in italics are new ideas):

    • GridLoad style stacked charts.

    • A "League Table" (totaled over the time period).

    • Pie charts (of the "League Table").

    • Filters and/or sub filters (just successful jobs, just jobs by one VO, just jobs for this CE etc).

    • A tabbed interface.

    • Variable X-axis time-step granularity.

    • Variable Y-axis parameter (jobs submitted, jobs started, jobs finished etc).

    • Data export option.

    • Automate back end data-source (currently using a small sample data set).

    • Regular Expression based filtering (Full regexp support added after the presentation.)

    • Make the interface more intuitive, including tool-tips etc.

    • Optimise database queries.

    • Make the interface more friendly.

    • Move to a more dynamic chart legend style.

    • Ensure w3 standards compliance/cross-browser compatibility & testing.

    • Possible inclusion of more "real time" data.

    Today I was perticularly pleased to fix a small bug that had been a persistent problem for quite a while. This problem related to an SQL database query containing both a GROUP BY and a WHERE clause. Specifically I was grouping by a DATE(columnid) and the produced result contained a COUNT of the number of jobs submitted per day WHERE the jobs had to meet a set of criteria. However the result set was far from optimal, specifically the WHERE clause caused the days for which there were no jobs meeting the criteria to be omitted from the results. Whilst this is perfectly reasonable, I required those days to be shown (but with a COUNT of '0' next to them). My initial thought was to JOIN (right outer) this results set onto a list of all possible days, unfortunately this had no effect as the WHERE clause still prevented COUNT '0' rows from being show (in reality these rows had a 'NULL' COUNT instead of a '0' COUNT, which would have displayed). After much Google-ing, I eventually found the solution was rather trivial although not at all documented and very non-intuitive. I simply had to switch the WHERE for an AND (normally AND is used to chain WHERE clauses and as such can never normally appear without a preceding WHERE). A particularly unusual solution.

    Original:
    SELECT DATE(jobsubmitedtimestamp) AS date, COUNT(jobs) FROM maintable WHERE ui REGEXP 'ch' AND vo='alice' GROUP BY date;

    Joined, but still not right:
    SELECT days.day, COUNT(jobs) FROM maintable RIGHT JOIN days ON maintable.date=days.day WHERE ui REGEXP 'ch' AND vo='alice' GROUP BY days.day;

    Solution:
    SELECT days.day, COUNT(jobs) FROM maintable RIGHT JOIN days ON maintable.date=days.day AND ui REGEXP 'ch' AND vo='alice' GROUP BY days.day;

    (The above are simplified versions of the my queries.)

    Thursday, July 9, 2009

    First Gadget (Day 14)

    A few days ago I "finished" my first WLCG monitoring gadget, although I use the term "finished" very loosely. More accurately my first proper attempt at a monitoring gadget had advanced to the stage at which it was usable and, if I was working with a full up to date data set not a small sample, it could potentially be useful. My original plan for this gadget and the gadget it's self are shown below.

    [Embedded gadget removed (old gadget no longer functions, as data source has been updated)]

    It is now my intention to continue developing this gadget into something much richer in features and options. Some of the features that have been suggested and that I would eventually like to incorporate in to the gadget include:
    • GridLoad style stacked charts.
    • A "League Table" (totaled over the time period).
    • Pie charts (of the "League Table").
    • Filters and/or sub filters (just successful jobs, just jobs by one VO, just jobs for this CE etc).
    • A tabbed interface.
    • Variable X-axis time-step granularity.
    • Variable Y-axis parameter (jobs submitted, jobs started, jobs finished etc).
    • Data export option.
    A rough idea of what this might look like/how it might function is available here (not embedded as its a large file). Another suggested feature has been to replace the legend with coloured check boxes for even easier showing/hiding of trends, however the HTML check box object does not allow use of colours. As such this goal has been put on hold for now until the rest of the design has been implemented.

    Thursday, July 2, 2009

    Google Gadgets (Day 9)

    For those unfamiliar with the concept of a Google Gadget, they are generally small "widgets" which can be added to any web page (although they are rarely used outside of iGoogle homepages). Two sample gadgets are shown below, one very simple one and one less simple one.

    Counter gadget:



    Date/time gadget:




    The gadget its self consists of a publicly accessible XML file, for example: Counter gadget or Date/time gadget. These XML files can contain a range of preference values including gadget title and author, but most importantly they contain a "CDATA" section. Within this section the data is read as HTML text instead of XML, this is where the main content of the gadget resided. The HTML in this section is very similar to the HTML found in any standard web page, with a few key differences.

    Probably the biggest of these of differences is that the HTML no longer contains <head> or <body> tags, only the content of the "body" tag (as shown below). However this does not prevent the inclusion of CSS or JavaScript which can still be included with the <script> or <style> tags in the "body" of the gadget.
    <?xml version="1.0" encoding="UTF-8" ?>
    <Module>
    <ModulePrefs title="hello world example" />
    <Content type="html">
    <![CDATA[
    <!-- START OF HTML -->
    Hello, World
    <!-- END OF HTML -->
    ]]>
    </Content>
    </Module>

    For anyone considering developing Google Gadgets it is definitely worth noting that Google implement heavy caching of Gadgets, as such even if you update the publicly accessible XML file for a Gadget you may not immediately see any change in the browser rendered Gadget. One easy way around this is to test your Gadget on an iGoogle homepage and adding the developer gadgets, which allows you to disable the caching of certain Gadgets. However this will not prevent the caching of any external JavaScript or CSS files linked to by your Gadget.

    A further complication caused by the caching of gadgets is that this means they must be static, not dynamically generated. In the context of this project that means that the gadget file itself can not contain any of the ever-changing Grid data. As such any data required by the Gadget must be loaded with an AJAX type call (in the project I will be using JSON responses instead of XML), the simplest way to prevent the requested page from being cached is to add a randomly generated number on to the end of the GET request url each time the page is called e.g. "'http://www.url.com/file.php?rand=' + randomNumber()". The more experienced web developer may at this point be asking, "aren't JavaScript GET requests locked to the current domain? How are you going to acess data outside of google.com", well the answer is yes it is domain locked, however Google provide a range of alternatives to the standard JavaScript "XmlHttpRequest()" function, each of which triggers google to act as an "AJAX proxy" such that the browser accesses the data from within the Google domain. This is shown schematically in the picture below, where steps 4-8 would be skiped if the Gadget file was already cached and 9-13 would be skipped if the requested page was already cached:

    Wednesday, July 1, 2009

    The Journey Thus Far (Day 8)

    As I mentioned in the previous post, the first part of this project involved finding a way to make the Grids "Logging and Bookkeeping" (L&B) data accessible to a Google Gadget. Initially it was suggested that I used a python CGI script to simply parse the tab delimited data files provided. This data could then be used to the build a JSON object, using the Google Visualization API, which could be returned in response to a GET request sent to the CGI. However, upon receiving a sample of the L&B data it immediately became clear that this would not be a sensible approach. This was because for just one weeks worth of data the uncompressed file size exceeded several hundred MB. As such using a single CGI script to parse, process and return the data would result in all 882MB (or more if additional weeks are to be considered) of data being read in response to each GET request. This would be very computationally inefficient and result in large delays for the end user (it is also likely that the GET request would timeout before any data was returned).

    I therefore determined that the best approach would be to load the L&B data into a database, which could then be queried to in response to a GET request and the required JSON object built from the database's response. The architecture I had planned for the system looked something like this:

    For the database RDBMS I am currently using a the excellent open-source mySQL, although the L&B data only forms a single table. I think, although I haven't quite reached the current state of the project, I'll end this post for now, as I feel an explanation of the basic structure/layout of Google Gadget is needed before going any further.

    Tuesday, June 30, 2009

    A Brief Introduction

    Over the coming weeks I will be developing one or more iGoogle Gadget with the aim a making the monitoring data recorded by "the Grid" more accessible. Whilst there are currently a range of tools available for visualising the Grid monitoring data "it can still be quite difficult to find answers to many straightforward questions". I will be undertaking this project at the University of Birmingham over a 10 week period, which will hope result in the production of a genuinely useful grid monitoring tool.

    For the benefit of anyone who happens to stumble upon this blog, who does not have a physics background, "the Grid" is large distributed network of computers predominantly used by particle physics researchers including those involved in the LHC experiments at CERN. The Grid has a hierarchical design consisting of several higher tier regional centers, each connected to many lower tier nodes, totaling "more than 140 computing centres in 34 countries". The "distributed grid" approach was selected due to the large computing requirement of the LHC's data analysis.

    In this project I will specifically working with the data recorded by the grids "Logging and Bookkeeping" service inconjunction with the various "Workload Management Systems" (WMS). This data consists of 37 fields of information relating to each "job" submitted to the Grid. Initially I will be dealing with the issue of making this data available in such a form that it can easily be accessed and read by the the Google Visualization API. Once I have achieved this I should be able to move on to creating gadgets to display this data in the most useful and informative way.

    So far I have been working on this project for 7 days. I intend to follow up this post with a summery of my progress thus far, as well as a regular series of progress reports.