June 30, 2003

#ifdef J2ME

Reading the source code for Simkin, I noticed some
//#ifdef
...
//#endif
tags wrapping things that aren't supported by MIDP. (Floating point, Reflection etc.) This seems to be a fairly reasonable use of #ifdef. Then I vaguely remembered (and confirmed) that Java doesn't have a preprocessor, and therefore doesn't support ifdef!

So... I wonder how Simkin is supporting this conditional compilation: is there an Ant task to preprocess versions of a source into appropriate directories and then build them?

And, if ifdefs are really considered such bad practise that the dubious miracle of "Write once run anywhere" was supposed to eliminate them, how should this be written?

From a vague recollection of (and sudden enlightenment on the purpose of) the Factory design pattern, I suppose that the code should be factored out into (taking the Interpreter class as an example)

  • AbstractInterpreter base class
  • J2SEInterpreter class. Includes methods which call objects via Reflection and others which add Float variables to other Floats and Integers.
  • J2MEInterpreter class. Includes only methods which call objects via a defined interface. Float objects are not used (substitute with some Fixed Decimal class).
  • The AbstractInterpreter class cannot be instantiated, but can act as a factory, returning the appropriate type of Interpreter class
  • Now it is easy for Ant to compile the classes without having to preprocess. One target can exclude source files ending with J2SE and the other J2ME.
OK, I starting this as a rant about why isn't #ifdef supported, and I think I may have convinced myself that I don't need it. Any other comments from anyone that's come across this in J2ME or otherwise very welcome!

(Googling "ant conditional compilation" finds various links such as this one which look relevant)
Update: Aha, it looks like Antenna, an Ant extension specifically for Wireless Java includes a preprocessor task among other goodies.

Posted by osfameron at 04:11 PM | Comments (1) | TrackBack

June 26, 2003

More Simkin overloading.

Thinking some more about overloading:

Evaluable is a bad name for the interface. I think I'll probably extract a number of overloaded interfaces and call them

  • OverloadValue (was Evaluable)
  • OverloadNumOp (was Overloadable)
  • OverloadCompare
  • OverloadStringOp
  • OverloadBooleanOp
Thinking about my interest in Simkin (modeling spreadsheet Cells), I came up with another issue: a Spreadsheet could contain
  • a String
  • a Boolean
  • an Integer
So far, so good: we just implement OverloadValue to return the appropriate thing. But as it might also implement some other custom classes (for example to get around the lack of floating point support in MIDP 1.n)
  • GilgameshDecimal
  • GilgameshDate
How do we return these? We could implement OverloadNumOp instead. But it means that the Cell class has to handle even simple cases like adding Integers. This means reimplementing all the good work handled for us by the Simkin Interpreter!

So... I've come up with another class

  • OverloadProxy
This Interface contains a single method public Object getProxy(). When an OpNode is encountered, if either side of the operator expression implents OverloadProxy then it will be substituted for the object returned by this method. The returned object could be an Integer, String etc. (which are handled internally by Simkin) or it could be an overloaded object of some sort which handles some or all of its operators by itself. (for example, the putative GilgameshDecimal object would implement OverloadNumOp).

All this means that we can still use a containing object such as a Cell as an Object, and call methods on it; but also substitute it for actual values used in calculations.

=A1.address()   // calls address method on object A1
=A1 + A2        // converts A1 and A2 into their proxy values and operates 
                // on these.
Note that if the Cell contained a special value that it wants to handle itself, it can always return this in response to a call to getProxy()!

(Any similarity to Perl's overload mechanism may not be entirely coincidental). Perl operators can be overloaded one by one. But as far as I can see, this would require a Java interface for every operator! So grouping them into functional areas seems to be a good compromise.

Posted by osfameron at 01:51 PM | Comments (0) | TrackBack

More thoughts on Simkin

With the work on ranges nearing completion, and the expression order evaluator theoretically complete (if not actually implemented...) I've been looking at expression evaluators again.

Or, to be perfectly honest, at Simkin. It appeals to me because:

  • It's small (22Kb)
  • It's ported to J2ME
  • It's open sourced
(+ I met Simon Whiteside at the Exposium 2003 and he talked enthusiastically about it which was inspiring). The documentation for Simkin is nice, especially the API which is very well javadoc'd, but much of the background information is a bit light on detail for someone considering embedding it in their application. I'd have liked at least these documents:
  • A user-centric tutorial on the language, to demonstrate Simkin's strengths.
  • A deeper discussion on the ways that Simkin can be called (interpreters, contexts, XML, databases etc.)
  • An architectural overview of the entire system.
  • A tutorial on the Executable interface, and how to retrofit it to an existing Java object in order to script it.
I'd like to help document some of these points (certainly the last), given sufficient time...

The good news

Some aspects of Simkin fit perfectly with what I want out of Gilgamesh. An expression beginning with '=' can be parsed, and run in the context of an object, like a Cell. That object is now responsibly for dispatching methods and resolving variables relating to it. An example would be:
Cell A1: =sum(B2, C2:D3)
The cell A1 will be asked to resolve the name "A2" and pass back an object to the Simkin interpreter. This is nice, as it means we can lazily provide cells and cell ranges to formulas! It also makes it trivial to implement the semantics I've suggested of dynamic cell references by column name:
Cell A1: =sum(B, C2:D3)
In this case, A1 will take the Column ref 'B' and return the Cell object in the same row (B1).

Through a mysterious serendipity, colon delimited expressions are also passed to the object so C2:D3 in the example just works! Without customizing the parser at all! (However row ranges, like 1:10 won't work. Sigh. I guess a kludge like "row1:row10" would be ok?)

The sum() method will also be called on the cell. (That's not particularly useful in this case. But the cell can just pass this up to Gilgamesh's function dispatcher.)

The bad news

For some reason, I'd imagined that cell names would be resolved to objects at compile time. But in Simkin the cell name is stored in the parse tree, and the object is only resolved at run-time. This means that if we insert or delete rows and columns, the name could risk pointing to the wrong cell!

But there is nothing to prevent the Cell caching the name of the reference with the object itself in a Hashtable. The next time Simkin asks for "A1", the Cell could return the right object, even though in the mean time it's been moved to Z100!

The problem finally is what happens when the user tries to edit a formula containing references to cells that have changed addresses? We can't just return the formula string that was originally passed in. The user might have input "=sum(B2, C2:D3)", but we may need to return "=sum(A1, G10:H15)". I can think of two solutions:

  1. When the user wants to edit a formula: Walk the parse tree substituting any variable matching a cached name with the new name belonging to the cached object. (Now we need to deparse the parse tree and return the String to the user...)
  2. Splice in the new names for the cell ranges into the original formula String. This would save some faffing with parse-tree walking (not to mention the deparsing code). But I've not worked out if it's possible to find out which position in the formula string the references are.
Though I the second version would be faster, it requires a better understanding of the compiler than I've managed to glean so far, and also could mean storing the column offset of every parse-node, which might be too heavyweight. In the meantime, I've started work on a basic deparser.

Evaluable

Simkin will automatically cast variables to float, int, string or boolean as appropriate. Strings will get parsed into numbers; Booleans are true if they are non-zero numbers or the string "true" etc.. This is quite clever and time-saving. (This sort of clever dynamic Do-What-I-Mean-ing is one of the reasons that I like Perl for example).

Gilgamesh Cells can contain numbers, Strings, booleans, Dates etc. This means that as Simkin stands, if a Cell contains an integer it would first be stringified with toString() and then parsed back into an integer! Also, it'd be nice to overload the object to have finer control over how its values are used in calculations. As a simple example, an object could return "Two" as a String, and '2' as an integer.

As another example, we might want an Error value to exist. If it's wrapped by a function (=if(iserror(A1), "!!!", A1)) then just evaluate it. But if we try to get its value (=A1 + 3, where A1 contains an Error value) then it will throw an exception which will cause the entire expression to return an Error.

I've created a proof of concept interface Evaluable, and modified some of the methods in Interpreter.java (intValue, boolValue etc.) to use its behaviour where possible.

Overloadable

Then I went slightly over the top and a proof of concept for arithmetic overloading... As J2ME doesn't support floating point class, I'll want to implement a Fixed Decimal class at some point. Also, I want to support dates. Dates can be added to numbers (21st January + 1 = 22nd January) but not to other dates (21st January + 10th June = ???). To support all this complexity in the interpreter itself is going to be nasty without some hefty refactoring. So I discovered a solution of extracting operations on conforming Objects into a nice interface (Overloadable) and letting the Classes that implement it worry about how to operate on each other...

The Overloadable class defines methods like .add, .subtract etc. When the Interpreter wants to (say) add two values, it will check if either of them are Overloadable. If so, it'll ask this object to handle the operation with the other value as its parameter. As an object could be asked to operate with any other Class, the second parameter will be passed as an Evaluable object to ensure a consistent interface.

(If neither object was Overloadable, then we fall back to operating on them as Integers or Floats as appropriate.)

So...

I'm glad that I don't have to implement my own macro language from scratch. Simkin has some really nice features, and I'm happy to work with it and possibly contribute something to it. (Of course now I have to hope that these ideas work with the development of Simkin in general).
Posted by osfameron at 01:26 AM | Comments (0) | TrackBack

June 13, 2003

Grids and Subgrids

I've been thinking a while about Lists actually being represented as Grids which are magically embedded into the top-level Grid. After waffling about it in the last 2 entries I'm ever more convinced that this is the way to go.

This is potentially much more complex than I'd planned, but it makes some things simpler. Some notes:

  • Though the data will be stored in the sublist, in the display the cells will be contiguous to the main grid (and in fact could be selected by the main grid too). e.g. B5 could be Sales[1] (e.g. row 1 in column Sales).
  • For convenience, the UI will show list headers instead of A B C D etc. if you are in the middle of a list. (In Excel you have to explicitly request this behaviour with Split Window and Freeze Panes).
  • If you tried to copy a cell from a List containing a reference to a column (=Sales_Result*2) to outside the list, e.g. where this column doesn't exist, the cell should set itself as an error.
  • (On the other hand, the main columns A B C D etc. should be referrable within the sublists too).
  • Sorting a sublist will be easy because you can just order the rows in the grid. Otherwise you'd have to buffer and copy cells around which would be a pain. (Or only sort by whole-row, which is admittedly very common: for example Excel still reverts to whole-row sorting under some circumstances, like when you share a worksheet!)
  • Curses: MIDP doesn't appear to have a sort method on Vectors! Time to look at those algorithm books again...
Posted by osfameron at 11:28 PM | Comments (0) | TrackBack

Spreadsheets and $Horrible$Syntax

Yuk I hate this spreadsheet syntax:
$A$1:$F$100
I've always hated it, and I don't want to have to use it in Gilgamesh. On the other hand it's clear why it's needed: the kind of formula I described in the last entry. When copying or filling cells, you want some of them to offset their references and some of them not to. By placing the dollar ($) sign before any reference that doesn't change, you have fine grained control over how this is done.

My thoughts on this are as follows.

  • Gilgamesh will be a light-weight spreadsheet designed for small devices. I don't want to sacrifice power if I don't have to, but if I can get simplicity and useability by sacrificing it then so be it.
  • If I've referred to a cell in the same row, then it's likely that I'll want the reference to offset downwards. Whereas if I've references a cell in another row then it's likely to be a constant value. So I can use the heuristic:
    • Offset any references that are in the row.
    • Keep references outside the row fixed.
  • On the other hand, if I've referred to a constant value, then I would probably have been better off giving it a name. =150*A5 tells me nothing about the formula, whereas =150*GBP_to_EUR gives me a named reference (which won't be offset) and tells me that I'm converting 150 pounds Stirling into Euros.
    • Offset any direct references
    • Keep references to names fixed.
  • I don't think I've ever seen any reason to offset an entire range. This behaviour causes me nothing by irritating bugs like the above.
    • Consider only ever offsetting single-cell references.
    • This is open to reasoning: if your spreadsheet usage is different to mine and you find this useful, let me know.
  • With the concept of row-scoped names, we could insist that no offsetting is ever done!
    • References to cells on same row actually only maintain a reference to the corresponding column. The actual cell referred to is dynamically calculated.
    • References to actual cells are never offset.
    • A1: =B+B1 would be equivalent to =B1*2, however if filled down (or moved) A2: =B+B1 dynamically becomes =B2+B1.
    • I originally thought that this would have an impact on the evaluation ordering (Previously described in detail). But in fact it should be minimal. When the formula is calculated it will return references to the actual cells (B1, B2) involved. However, internally it will only store the reference to the column. This means that every time the Orderer notices a change it will deal with registering and deregistering the Cell from the evaluation tree as appropriate.

      The only additional complexity is that the formula becomes slightly range-volatile. That is, it is affected by movements of the cell between lines, but not by insertions/deletions in the Grid that happen to change the line's address. This could easily be represented by an additional special node in the evaluation graph.

I hope that it's clear that there isn't a Right Way to do things. But I think that there may be a better way than the current standard semantics, where better in this case means "more optimized for the simplest possible task that someone would want to do on a constrained device like the SE P800". NB: we will probably offer an OFFSET() formula just like Excel. But it'd be nice to have to use it for just the complex stuff, and make the easy stuff easier!
Posted by osfameron at 11:27 PM | Comments (0) | TrackBack

Excel's dynamic named ranges considered harmful.

Though I'm stuck in nitty-gritty implementation work, and this isn't a priority for some time yet, I thought I'd whine for a moment about Microsoft Excel's dynamic ranges.

Lists

I usually use Excel for managing lists. Lists have a header row with the title of the column ("Name", "Country", "Score") and any number of rows underneath them. You will often want to write formulas that act on a whole column: for example
  • SUM: add up the figures in a column
  • VLOOKUP: look up a value in one column and return its corresponding value. (For example Name -> Address lookup)
That's great if you already know how many rows there are going to be. Otherwise you can either
  • Insert a new row in the middle of the range. (Because the range runs from the first row object to the last row object this just works.)
  • Start inputting the data on the next row after the last row in the list. This, on the other hand, won't work. The range that you referred to in the formula doesn't change. So you're going to have to dig up any formulas that need to be changed and modify them by hand.
You could make your range into a whole column (=SUM(A:A)) but this could be wasteful (though I don't doubt that Excel optimizes this case and doesn't scan thousands of empty cells unnecessarily) and is also unsubtle (if you have any cells below or above the list, they'll be included too - including the Header row itself!)

Dynamic Named Ranges

Excel's answer to this is Dynamic Named Ranges. From the useful OzGrid page on the subject:
Possibly one of Excels most underutilized aspects is its ability to create dynamic named ranges that will expand and contract according to the data in them.
Let's see how you create a named range.
Go to: Insert>Name>Define and in the Names in workbook box type any one word name (I will use MyRange) the only part that will change is the formula we place in the Refers to box.
OK so far, note that you can't just access a dynamic range from the formula bar, they have to be named (hence "dynamic named range")... Now let's see an example of a formula defining a dynamic range:
1: Expand Down as Many Rows as There are Numeric Entries.
    In the Refers to box type: =OFFSET($A$1,0,0,COUNT($A:$A),1)
And at this point you might already understand why this functionality is "underrated" ;->

The OFFSET() function takes these arguments

  1. the initial range ($A$1)
  2. the head row offset (0 - the row remains 1)
  3. the head column offset (0 - the column remains A)
  4. the tail row offset (COUNT($A:$A) - count the number of non-blank cells in column A and offset the end of the range by that number)
  5. the tail column offset (1 - being the width of the new range)
The logic is sound, but the syntax is spuriously ugly for something so simple.
3:Expand Down to The Last Numeric Entry
    In the Refers to box type: =OFFSET($A$1,0,0,MATCH(1E+306,$A:$A))
If you expect a number larger than 1E+306 (a one with 306 zeros) then change this to a larger number.
This is quite clever in a horrible-kludge way. MATCH() with these parameters finds the largest number in an ascending list that is equal to or less than the parameter. As it's expecting an ascending list, I guess that it optimizes by starting the search from the end and looking backwards. Because the first number that will be found is bound to be smaller than the very large number provided, this hack will work.

I don't want to come across as criticising the OzGrid site or the useful examples that it provides. Certainly for the advanced searches that it goes on to describe ("Expand down one row each week", etc.) this syntax is flexible and powerful. But it's fairly clear from the entries that I quoted that a main use is to sum or manipulate columns in a list. And this really should be simpler! Indirection like used in OFFSET, coupled with bizarre hacks to locate the endpoints is liable to be tricky to use, debug, and maintain, and cause subtle errors.

A new hope

I'm thinking along these lines for Gilgamesh:
  • Every list will have dynamic named ranges created automatically for each column.
  • These ranges will usually be named after the header row. For example a Column header which has "Q4 Sales" might generate a dynamic range called "Q4_Sales".
  • An external formula can now simply do =sum(Q4_Sales).
  • Of course if someone had already registered a range called Q4_Sales we'll have to do something appropriate mumble mumble handwave handwave ;->
  • If we change the header text to "Q4 Sales figures", the range will rename itself to "Q4_Sales_figures", and because all referencing formulae are pointing at an object, not a String, they will magically have rewritten themselves (=sum(Q4_Sales_Figures)).

Single cell ranges

And something that might be more controversial: single cell dynamic named ranges.
  • Within a list, you are more likely to refer to a cell within your row than to a whole column. For example, consider the columns
    • First Name
    • Last Name
    • Full Name
    You'd like Full Name to be automatically generated. I think a formula like
    Full_Name: =First_Name+" "+Last_Name
    is about as clear as you can get, and certainly clearer than the kind of thing I write in Excel (=A5+" "+B5). So, within a row in a list, names will be scoped by default to the row itself.
  • Of course you might want to be able to refer to a whole column, for example to compare a score with all the competitors (=(Score - AVERAGE(Score))/100) But this won't work: how can we tell that the first 'Score' is a range value but the second is a single value?

    I think it would make sense for a column in a list to take on the semantics of a column in a grid. So how about (Score - AVERAGE(Score:Score))/100)? The construct : is parsed as a column range (A:B etc.) so this is relatively intuitive. But it's ugly, and it's a lot of typing. Actually, on a pen-based device like those targeted by Gilgamesh it's a lot of hand-writing recognition ;-> Perhaps we can assume that a column range where one column is omitted is actually a single column:

    (Score - AVERAGE(Score:))/100
    I had considered going the Perl way and having an array sigil (Score - AVERAGE(@Score))/100 which stands out more. But I like the idea of treating columns of a list just like columns of a Grid. After all you can sort by them, which is an extension of this abstraction. It'd be nice to have consistency.

Consistency

If we want consistency, then I suppose that we should be able to refer to the Grid columns (A B C D E ...) within a row.
C1: =A+B (e.g. is the same as =A1+B1)
This could help resolve an error that I often make: when I was testing the Average example above, I filled in A1:A4 in an Excel worksheet, and then put in A1: =(A1 - AVERAGE(A1:A4))/100 which gave the right result. But when I filled down, I forgot as I so often do that this doesn't just move A1...
  • =(A2 - AVERAGE(A2:A5))/100
  • =(A3 - AVERAGE(A3:A6))/100
  • ...
I don't want A1:A4 to change... should have put A$1:A$4. Which leads me to my next rant.

UPDATE (2003-06-19): From this page, I found this tip

[Ctrl][Shift][F3]: Automatically creates Named Ranges from the headers for the selected table of data with row or column headers.
This seems to do something similar to what I want, but has some oddities, which I will play with
Posted by osfameron at 11:24 PM | Comments (0) | TrackBack

IDEoblogy

From early on, it occurred to me that this blog could form an informal basis to documentation for the project, and I'm planning to include an export of the blog along with any distributed documentation.

Now, dsuspense has noted a possible convergence between IDE and Blog. (Apparently after reading Russell Beattie's entry on blogging getting in the way of actually coding. Oh how I know that feeling! But a blogging IDE could really help make blogging a seamless, managed, integrated activity.)

I've had a few thoughts about how a blogging plugin could work:

  • Within the Code window you decide to blog (shortcut key, button etc.)
  • A blog editor comes up allowing you to choose the topic of your blog (Selected code fragment, this method, this class/file, whole project).
  • You can choose a working title, and make a few notes.
  • A link to this blog fragment is visible in the margin of the fragment/ method/class/file, and you can edit at any time.
  • When you are ready to publish, you can view all the current fragments, and you may choose to collate them (you could sort fragments about a particular class together for example).
  • You can then edit the entire entry, and post.
  • Appropriate comments with a permalink to the resulting post will be added at the relevant places.
  • Presumably, the text you've submitted would be searched for Class/method names, which you could prompt to update as well (if you've mentioned them more than tangentially - I guess this would have to be a manual choice?)
The blog will also be stored locally, and can be accessed easily and integrated into other working routines. For example:
  • Blogs for a particular method can be instantly brought up (compare IDEA's Ctrl-Q for quick Javadoc).
  • Automated code review can check for code coverage in the blog. "Have I written about class Foo yet?"
As the IDE is unlikely to be the only blogging tool used, it would have to be able to refresh itself from an RSS feed, scan entries, and import the relevant ones into the local database (and add comments/links in the code). This will have a few side effects:
  • When working on a project, you could import multiple RSS feeds so that various contributor's comments on the code get imported automatically.
  • Set java.blogs as an RSS feed and use it as a vanity agent to see if anyone else is talking about you ;->
Posted by osfameron at 08:43 AM | Comments (0) | TrackBack

June 12, 2003

IntelliJ: 7 days to go

I now have use of the IntelliJ trial for another 7 days.
Sadly, due to other time-commitments, this doesn't mean that I've had the opportunity to actually use IntelliJ for the elapsed 23 days of the trial. But I've been impressed enough to be a little sad that it finishes so soon. Some thoughts:

Yes, it does make sense to move from an editor that you love to an IDE. You do lose some expressive power of the keyboard mappings of the editor (and no, I've not tried the vim plugins available) but you gain:

  • Syntax checking: I'm relatively new to Java, and I still make a lot of mistakes. With Vim, I got used to finishing a piece of work, running my ant script and mentally placing a guess on how many compilation errors I'd have caused. With unobtrusive highlighting and hints, I very rarely have to go back and debug for them.
  • Refactoring: One of those things that you never realize you need till you need it. Of course you can rename a method by grepping through all buffers and changing it manually each time. But the built-in refactorings are quick, syntax aware, can check documentation, and have a nice persistent results window so that you can go through any required manual changes without losing your place.
  • Autocomplete: I love Vim's autocomplete - it's quick and powerful. But Vim doesn't know Java, and IntelliJ (which does) can really add value by only autocompleting with something that's sensible in the context.
  • Intention actions: I love that in class Foo you can write a non-existent method: bar.doBaz(this); and with a couple of clicks be transported to class Bar with a function already fleshed out for you
    public void doBaz(Foo f) {
    	return null;
    }
    Similarly, it's easy to flesh out getters and setters, interface or abstract methods that need to be filled in, and the like.
  • Automatic imports: No more scouring the API for which package an imported class is in.
  • Editor niceties: I love the way that typing a quote (") automatically generates the closing quote too (""). This feature could have been intrusive but isn't: if you type the closing quote anyway, IDEA just hops your caret after both quotes rather than ending up with 3 ("""). Same with parenthesis.
  • Todo, error, and warning indicators: To the right-hand side of the edit window appear coloured bars indicating the relative position in your file of various things: Yellow for Warnings (like unused fields), Red for compile errors, Blue for TODO items. Clicking on these bars takes you to the relevant line. I found this interface a bit odd at first, but it's very useful.
  • Project view: For ease of hopping between files. I know that Vim can hop buffers using the :bu firstfewcharactersofname which is nice, but actually just choosing the class from a constantly visible window is great. I was surprised how much I liked doing this.
I have only a few concerns with it:
  • The garbage collection seems a bit slow, over 10 seconds when the whole application just hangs. As there's no warning this is a bit bizarre (but you can see from the memory usage bar at the bottom-right before and after what it's been doing). Considering the productivity gain elsewhere, I can live with it, but it is a little annoying.
  • The transparent windows: (no actually I think you can turn them off, in fact the user interface is very nice and uncluttered).
  • I sometimes found the 'go back' function (Ctrl-Alt-Left) would get confused with 'go to previous tab' (Alt-Left).
Oh, and the price. Software development is not my main job, and I have to think really hard about whether I can afford this. (But I will think hard).
Next... Eclipse.
Posted by osfameron at 02:51 PM | Comments (0) | TrackBack

Dynamic grid data-structure: autovivifying cells

The coding is going well though, due to lack of time, much more slowly than I'd like. So I'm still working on the container model (Grid, Axis, Rows, Columns, Cells), which turns out to be more interesting than I'd thought.

I'd originally planned to have a fixed size grid, which could be expanded if necessary. Microcalc works this way: you specify a starting size (8 x 32 by default) but can insert/delete rows later to change the size.

Dynamic grids

Then I decided that I wanted the grid to be more dynamic: Rows and Columns will be appended when needed, and deleted when not. If I ask for a cell (E111) or a range (A1:Z100) then that object, plus any relevant structure like the Rows and Columns that contain it should also be returned. But only if they are required! To explain what I mean a bit better:

  1. In the User Interface, user clicks on cell E13.
  2. I want to be able to return the Formula of that cell, even if it doesn't exist.
  3. If the user modifies the formula, I want to be able to write cell.setFormula(newFormula) and transparently handle the cases where in fact that cell, and the row and column it's in don't exist.
  4. If however the user decides to move on and click on another cell, I don't want to have created a cell, rows and columns etc. that are now unused.
My solution for this is to return a proxy-object if the real object doesn't exist. Actually, I don't think that Java supports proxies as such (like Mark Overmeer's Perl module Object::Realize::Later) which dynamically change their class to the requested one when certain conditions are fulfilled. So I'm actually settling for these definitions:
  • Real: an object that is referenced by its container object. (A Cell in a Line, a Line in an Axis)
  • Proxy: an object which knows which container it should be in. However, its container does not have a reference to it.
Here's an example of the usage of these proxy objects:
  1. We request cell at (3,3)
  2. Grid gets Row 3 and Col 3
  3. ... but there are currently 0 rows. The Row LineAxis creates a proxy Line at order 3 and returns it.
  4. ... and the same for the Column LineAxis. NB: neither LineAxis actually keeps any reference to these proxy objects.
  5. Grid checks if ProxyRow 3 contains a Cell intersecting with ProxyColumn 3.
  6. ... it doesn't of course. So it returns a Proxy Cell which claims to be in Row 3 and Col 3.
  7. If we later discard the cell then the Proxy Row and Col objects have no references to them, so they will be discarded.
  8. But: if we update the cell, it will now try to upgrade itself into a proper cell by calling its registerSelf() method.
  9. This will ask both its Row and Col to register it. They will now maintain a reference to the cell.
  10. As soon as the Proxy Lines register the cell, they will also have to register themselves (again by calling a registerSelf() method.
  11. The respective LineAxis will note the current size (0), and the new order (3).
  12. LineAxis first appends 3-0=3 new lines to the Axis, and then appends the proxy Line.
  13. It doesn't now matter if we lose the reference to the Cell, as it is now properly registered in the Grid structure.
  14. The next time we request cell at (3,3)
  15. Grid gets Row 3 and Col 3
  16. ... this time these are real rows.
  17. ... and checks if Row 3 contains a cell intersecting with Col 3. This time it does and the cell is returned.

interfaces

I coded this rather ad-hoc, but will refactor into 2 interfaces:
public interface GridComponentCandidate {
	public boolean isInteresting();
	public void registerSelf();
	public void deRegisterSelf();
	public void addRegistrar(GridComponentRegistrar g);
	public void removeRegistrar(GridComponentRegistrar g);
}
public interface GridComponentRegistrar {
	public void register(GridComponentCandidate c);
	public void deRegister(GridComponentCandidate c);
}

Interesting?

The method isInteresting() returns the conditions under which a Component needs to be registered. For example, a Cell will be registered if it has (any one):
  • a value
  • a formula
  • a format
  • dependencies (one or more nodes refer to it)
  • a name/alias
A Line will be registered if it has (any one):
  • one or more registered cells
  • one or more registered ranges
  • a custom name (?)
Each GridComponentCandidate object will check whether it has become interesting (or vice versa) in any method that updates any of these criteria.

Subscribe relationship

This is basically a subscribe/callback relationship, so I guess addRegistrar(...) and removeRegistrar(...) are modeled on the Listener interface. IIRC, J2ME doesn't have a generic Listener interface to inherit from, so I'm keeping it light-weight by not coding one up (but this decision may be wrong and I'll change it if so). Originally I hardcoded the registerSelf() methods (Cells had to update their Row and Col; Lines had to update their LineAxis) but I think it may be more flexible (for example, for testing) to use this model.

LineAxis more complex

For LineAxis, when it is asked to register(Line l) or deRegister(Line l), the behaviour may be quite complex. For example:
  1. Create line 3 proxy
  2. Update line 3 and register
  3. ... Axis appends blank lines 0-2, line 3
  4. Create line 5 proxy
  5. Update line 5 and register
  6. ... Axis appends blank line 4, line 5
  7. Line 3 becomes uninteresting and requests to deregister.
  8. ... however, Line 3 is not the last line, so no action is taken.
  9. Line 3 become interesting again and requests to register.
  10. ... Axis already has a Line 3.
  11. ... so we check that it is the same as the registering Line 3.
  12. ... it is, so we do nothing. (If it had been different, we'd need to raise an exception: see the next section on Out Of Date Proxies)
  13. Line 5 become uninteresting and requests to deregister.
  14. ... Line 5 is the last line so
  15. ... LOOP: The last line (5) is uninteresting, so deregister it.
  16. ... LOOP: The last line (4) is uninteresting, so deregister it.
  17. ... LOOP: The last line (3) is interesting -> BREAK

Out Of Date Proxies

If we keep a reference to a proxy after another proxy has autovivified, it may be out of date. For example:
  1. Create line 3 proxy.
  2. Create line 5 proxy.
  3. Update line 5 and register
  4. ...Axis appends blank lines 0-4, and line 5.
  5. Get and modify (real) line 3.
  6. Update line 3 proxy and register
  7. CONFLICT: line 3 already exists.
There may be better ways to deal with this, but I'm currently assuming that only these objects:
  • Current Cell
  • Selected Range
  • Destination Range (for copy-paste)
will ever contain proxy cells or ranges. As they will constantly select new objects, old copies of the proxies will not stick around. If a careless programmer (me) forgets this and keeps an out of data proxy, then an Exception will be raised at runtime. I thought that this kind of problem might be irritating to debug, as the proxy could have been assigned at any time; but on the other hand, if the proxy isn't in one of the documented objects above then it might actually be easy to track down.

There is an additional complexity for Ranges: the Head Line will need to be registered before the Tail Line. (Otherwise when the Head line tries to register it will be rejected as Out of Date!)

Data::Table

Thinking about representations of the Grid, I recalled the Perl module Data::Table. Looking at this, it appears that it maintains Cells in either Rows or Columns (but not both). However, if it would be useful to treat them in another way (for example to insert a Column in a Row-based table), then Data::Table internally rotates the table to the other representation before continuing. That's actually quite clever, and more efficient in terms of storage requirements. I don't think I'll follow suit, but something to think about.
Posted by osfameron at 11:21 AM | Comments (0) | TrackBack

June 01, 2003

Gone coding...

I'll be away from this blog for maybe a week or so, as I'm a) busy, and b) actually writing some code in whatever time I've got left.

Messages of support gratefully received to gilgamesh at osfameron dot abelgratis dot co dot uk, or as comments here!

Posted by osfameron at 12:23 PM | Comments (0) | TrackBack