This is a list of things I'd like to see Luminous do in future but which I have no immediate plans to implement. These are therefore things that could be picked up by a contributor, if anyone wishes.
- Luminous internally represents a token stream as an XML string. This comes from the scanner and is passed to the formatter. This is a leftover from early versions and isn't ideal: it's not strictly necessary and the string conversion/parsing is ugly and probably slows things down for the formatter. The token stream is stored using PHP data structures until the very last point before the scanner hands over control. It would be simple to remove this were it not for the fact that filters sometimes go into the token structures and start nesting other tokens, by embedding XML. Therefore a better way of handling hierarchical tokens is needed as well. If I recall correctly, the stateful scanner may have something along these lines.
- The token structure in Luminous is pretty much non-existent. Tokens are just arbitrary strings. These could be pulled out into more central definitions and thereby form a half-way coherent specification of what CSS classes should be respected.
- It would be good to structure them better; at the moment tokens are entirely individual, whereas sometimes tokens can be seen as a kind of hierarchy - e.g. most programming languages have different classes of keywords such as keywords (e.g. 'if') and keyword operators (e.g. 'and'). These could be better transcribed into the css as .keyword and .keyword.operator.
- The cache can use MySQL as a storage location, but no other RDBMS due to reliance on INSERT IGNORE. It may be that the queries or logic can be rewritten using standard SQL, or failing that, RDBMS specific queries and a settings parameter would be acceptable.
- Performance and optimisation: Luminous is slow. It 'seems' slower than it should be, but I haven't managed to identify any easily removable bottlenecks. Therefore performance improvements would be welcome (but not at the expense of code readability or quality).
- Virtually all languages can be improved!
- I'd like to see the Ruby scanner improved, or at least tested, by someone who actually understands Ruby's grammar (I've written about 10 lines of Ruby, ever).
- The CSS scanner should ideally handle dialects like SASS [sass-lang.com] and CleverCSS [sandbox.pocoo.org]. (Note: SASS apparently has a legacy syntax which makes it very similar to CleverCSS).
- The line numbered HTML output uses a somewhat overly complicated markup, utilising a table and having the line numbers and code in two adjacent cells. The reason for this is that it provides good control over both the code and numbering, and makes the numbering seem transparent (i.e. copying and pasting the code won't get the numbering too). There may be better ways to achieve this, and the markup could probably be made cleaner.
- Various mature syntax highlighting plugins exist for various blogging platforms, providing things like integration into the rich text editor. Most of these are hard coded to use a specific highlighter (SyntaxHighlighter and Geshi seem to be the most popular). License permitting it may be possible to abstract away their dependence on a particular highlighter.
- The CodeIgniter plugin (GitHub [github.com]) uses regular expressions to extract code blocks from the page's markup. We all know the dangers of parsing XML with regular expressions. A good alternative would be using xpath, however, it also recognises [code] ... [/code], which is invisible to XML.