Archive for the ‘Java’ Category

Recognizing URLs within plain text, and displaying them as clickable links in HTML, in Wicket

Wednesday, March 12th, 2014

I have just, out of necessity for a customer project, written code which takes user-entered plain text, and creates out of that HTML with URLs marked up as clickable links.

Although marking up links in user-entered text is standard functionality, Stack Overflow would have you believe that it’s not something that should not be attempted, as it cannot be done perfectly. This is technically correct, however, users are accustomed to software which does a best-effort attempt, and customers are accustomed to take delivery of software meeting users expectations.

The software I have written is available as open-source, either as a Java class with the method encodeLinksToHtml which takes some plain text and returns safe HTML with clickable links, or as a component in the Wicket web framework called MultilineLabelWithClickableLinks.

Finding links within text is not as easy at it seems

Users may enter with/without protocol (http://). Domains may or may not have www at the start. There may or may not be a trailing slash. There may or may not be information after the URL. Having a whitelist of acceptable domain endings such as “.com” is a bad idea as the list is large and subject to change over time. Punctuation after links should not be included (for example “see foo.com.”, with a trailing dot which is not part of the URL)

The software matches “foo://foo.foo/foo”, where:

  • Protocol is optional
  • Domain must contain at least one dot
  • Last part is optional and can contain anything apart from space and trailing punctuation (= part of the sentence in which the link is embedded)

Quotes are not allowed because we don’t want <a href=”foo”> to have foo containing quotes (XSS).

Making links clickable is not as easy as it seems

Facts:

  • Conversion from plain text to HTML requires that entities such as “&” get replaced by “&amp;”.
  • Links such as “foo.com/a&b” need to get replaced by “<a href=’foo.com/a&b’>foo.com/a&amp;b</a>”. (“&” in URL needs to stay “&” in the href, but needs to become “&amp;” in the visible text part)

Therefore,

  • One cannot firstly replace entities and then markup links, as the links should contain unescaped “&” as opposed to “&amp;”.
  • One cannot firstly encode links and then replace entities as the angle brackets in the link’s “<a href..” would get replaced by “&lt;a href…” which the browser would not understand.

Therefore, the replacement of HTML entities, and the replacement of links, must be done in a single (complicated) pass, rather than two (simple) passes.

Jetty doesn’t show errors on web application start-up

Monday, December 23rd, 2013

From a certain version of the “jetty” package in Debian Linux, if the web application didn’t start up (servlet init() throws an Exception), this error wasn’t logged anywhere. The solution is to install the libjetty-extra package.

sudo apt-get install libjetty-extra

It took some amount of experimentation to find the solution. I don’t know why you’d ever want to not log errors; i.e. why the logging of errors is an “extra”.

Jetty is a Java web server, similar to Tomcat, it’s my server of choice. It’s simple, doesn’t seem to do much apart from run WARs, and (apart from this issue) I’ve rarely had any problems with it.

Wrapping IDs in objects

Thursday, December 5th, 2013

In Java, I like to wrap ID values in objects, rather than just passing them around the code as their native “int” or “long” or “String” values.

The reasons are twofold why using an object is better:

  • Code becomes more readable, for example foo(LoginId x) is more readable than foo(long x). (Although perhaps neither foo nor x are good names, so perhaps the example over-exaggerates this improvement.)
  • The compiler can do more checking. If you pass a job advert’s id (as a long) to a function expecting a login id (as a long), the compiler cannot warn you of your mistake. This becomes particularly relevant if you have a function taking two IDs and you pass them the wrong way around.

Things to consider when writing such an “ID object”

  • Do not allow the ID contained within the object to be null. Having “LoginId x” where x is not null, but the value contained within x is null, makes no sense. (For example, use primitive types if dealing with numerical IDs, as they cannot be null.)
  • If the ID is a string, don’t allow this string to be empty; same as above.
  • Implement equals and hashCode methods so that these IDs can be used within Sets, or as keys within Maps, or as keys in Wicket drop-downs, or wherever.
  • Make them serializable.
  • Make the ID attribute a “public final” attribute. That means useless getter methods can be avoided, from the object itself and from client code.
  • The object should have a single constructor which takes the value and sets it in the attribute.
  • Implement toString so that the debugger can display the object usefully.

Are Java enum values instances or classes?

Saturday, April 13th, 2013

That depends on if the enum values provide methods which differ from one another.

The following code produces just one class file, “Network.class”. “facebook” and “linkedIn” are just instances of the Network Java class.

public enum Network {
  facebook, 
  linkedIn;

  public void printName() { System.out.println(getName()); }
}

But the following code produces one class file for each value, named “Network$1.class” etc., as well as one class file for the abstract superclass, “Network.class”.

public enum Network {
  facebook {
    public Client newClient() { return new FacebookClient(); }
  },
  linkedIn {
    public Client newClient() { return new LinkedInClient(); }
  };

  public abstract Client newClient();
}

“facebook” and “linkedIn” are in fact different Java classes now.

Having a constructor taking parameters, and initializing each value of the enum by calling this constructor with values, is not sufficient to force the generation of individual classes per value.

Just because they are different classes in this situation doesn’t automatically mean you can do everything you’d expect to be able to do with a class. You can’t test for class membership using “instanceof” for example (not that this would be very useful for an enum, as there is only one instance of every value).

Wicket component to count up to a value, then refresh its value from the server

Monday, January 7th, 2013

On the avaaz website there is a nice animation to display the number of users the site currently has. The animation is a number, which:

  • Initially, counts up from 0 to the current number of users client-side, taking a few minutes
  • Thereafter, refreshes its value from the server regularly, increasing its value as new users register.

I needed something similar for a project I’m working on, so I’ve written a Wicket component to do it.

It’s perhaps not the most revolutionary component ever! But maybe it’s useful to someone. It’s available under the GPL.

CountingUpThenAutoRefreshingLabel  JavaDoc | Download.

The above example counts up to 1M and then the “refresh” is faked, for this example, by being a number which slowly increases; imagine it’s the number of users the site has, or similar. (It wouldn’t normally be in an iframe either; that is a consequence of the fact this blog is WordPress not Wicket.)

I think this is a good animation in terms of keeping the server load low. Initially, there are no server calls as the animation is done client-side (the normal use case that the page won’t be open long). The second phase re-queries the server at an interval, however this interval gets longer (i.e. the updates become slower), so that the other use case (user opens the browser and goes to lunch) also won’t result in N users hitting the server every 2 seconds for the rest of time.

Wicket 1.4–1.5, Java 1.6

Email Template

Thursday, January 3rd, 2013

One often needs to send users email notifications. Ideally these should have the following capabilities:

  • Text is stored in separate files (not strings in the source code)
  • Localizable into different languages
  • Variables (“Hello Adrian!”)
  • Attachments (e.g. PDF invoices)
  • HTML (for styling)
  • Plain text alternative version
  • Inline images (e.g. company logo)
  • Variables displayed properly in HTML mails (escaping of “&” etc.)
  • Variables which are embedded directly in the HTML (“Here are your products: <table>…”)
  • Supporting unit testing (no email sent; assertions over what would have been sent)
  • HTML is easy to edit (double-click file, see inline images, use HTML editor, etc.)
  • Unicode (UTF-8) only

Finding no such library available, I wrote my own in Java: EmailTemplate. Javadoc | Download.

Did I oversee something? Is there a library out there which does the above which everyone else uses? (In case not, my library is open source.)

This software has a long history. Originally developed for Onestop Concept in Perl, then developed for Easyname in PHP (no longer used), and then for United Youth in Java, it’s been updated for the HR website I’m developing.

How to generate “svn info −−xml” programmatically

Wednesday, December 12th, 2012

In Java, there is the great SVNKit which I have used for many a customer project successfully. You can do all Subversion operations in a few lines of Java, with a great object-oriented interface, exceptions, and so on. It’s easy to use, and it just works.

Today I had to produce certain XML files programmatically based on “svn info –xml” information. I was glad to see there was a SVNXMLInfoHandler which allows you to write the XML into any SAX ContentHandler. Again, great software design, it’s exactly what you want.

Alas, it didn’t quite work. At least in my setup of using SAXON and Xerces for XML processing, which we are already using for the project in hand. (In a different part of the software, XSLT 2.0 processing is done, and SAXON’s about the only library I know for any language that can do it.)

The problems were:

  1. That SAX events were written to the stream but the document was never started/ended,
  2. The source provided no namespace information, but SAXON requires namespace information (even if just to explicitly say that the “empty namespace” is used.

Unhelpfully, regarding point (2), I got the error

org.xml.sax.SAXException:
Parser configuration problem: namespace reporting is not enabled

I didn’t really know what this meant, especially in the context of SVNKit producing “svn info –xml” information. Looking at the source of SAXON, it turns out that the wording of the error makes sense if you’re using an XML parser to feed the XML tags into SAX, as opposed to e.g. SVNKit. In that case, the XML parser would have a document with namespace information, and could “report” it to SAX, or not. So, in other words, SVNKit wasn’t producing tags with namespace information, and SAX didn’t like that.

Looking at the source of SVNKit we can see lines like:

getHandler().startElement("", "", tagName, mySharedAttributes);

So startElement is being called e.g. like

startElement("", "", "info").

What SAXON needs is something like

startElement("my-namespace-url", "info", "ns:info").

So alas I had to develop the following class, which can be used as follows:

class SvnKitDomCreator extends IdentityForwardingSaxHandler {

  public SvnKitDomCreator(ContentHandler destination)
  throws SAXException {
    super(destination);
    startDocument();
    startPrefixMapping("svn", ns);
    startElement("", "", commandType.name(), new AttributesImpl());
  }

  @Override public void startElement(
    String uri, String localName, String qName, Attributes noNsAttr
  ) throws SAXException {
    AttributesImpl nsAttr = new AttributesImpl();
    for (int i = 0; i < noNsAttr.getLength(); i++)
      nsAttr.addAttribute("", noNsAttr.getQName(i),
        "svn:" + noNsAttr.getQName(i), noNsAttr.getType(i),
        noNsAttr.getValue(i));
    super.startElement(ns, qName, "svn:" + qName, nsAttr);
  }

  @Override public void endElement(
    String uri, String localName, String qName
  ) throws SAXException {
    super.endElement(ns, qName, "svn:" + qName);
  }

  public void close() throws SAXException {
    endElement("", "", commandType.name());
    endPrefixMapping("svn");
    endDocument();
  }
}

SvnKitDomCreator domWriterWithLocalName = 
  new SvnKitDomCreator(domWriter);
clientManager.getWCClient().doInfo(repository,
  SVNRevision.HEAD, SVNRevision.HEAD, SVNDepth.EMPTY,
  new SVNXMLInfoHandler(domWriterWithLocalName));

The source for the referenced IdentityForwardingSaxHandler is LGPL: Download Databases & Life Util ZIP.

P.S. Hooray for open source! I’m sure I’d never have managed to get to the bottom of this if I hadn’t had the source for SVNKit and SAXON. I would have just been stuck with the conclusion that SVNKit produced XML, SAXON consumed XML, and “something” to do namespaces was going wrong.

SVNKit 1.7

Why am I using Hibernate?

Tuesday, December 11th, 2012

This was a blog post I found in my “drafts” folder from 2009. I am happy to report that I am no longer using Hibernate, and thus my stuff works.

Something is going wrong in my life. I read the book about Hibernate by its author, Gavin King, and I was really impressed, for the first time in my life, by an object-relational mapping system. Although I doubted it would really save development time over writing SQL manually, I thought it wouldn’t significantly increase development time either, and it had a few nice features (like being able to have a key-value Map in an object, and that Map getting persisted to its own table automatically).

But alas that was where my life started to go wrong. I’ve now completed a number of projects using Hibernate, and it’s all been hell, really.

  • I spent a whole work day trying to get the system to reconnect to the database after the connection was lost – this is 1 line in WebTek (Perl framework written by a friend, I have deployed 2 apps using it), or in my hand-written Uboot code, etc. (blog post)
  • If you use the Query API to say “give me any objects in the following set” and pass an empty set, rather than no objects getting returned, invalid SQL gets generated. I filed a bug report, but was told that generation of invalid SQL wasn’t an error. (And he, the lead developer of Hibernate, was hardly polite, see the last comment.)
  • If you persist a key-value Map to a database table (e.g. with a “key” col and a “value” col), if you overwrite a value in the Map (e.g. was “foo”->”bar” but now is “foo”->”baz”) then you’ll get a unique constraint violation as inserts of new values are done before deletes of old values. According to the bug report, this is expected behavior. My solution: drop the constraint, which I’m not too happy about. (Constraints are not only useful for data integrity, but also act as assertions which help database to choose the fastest statement execution plan)

I mean basically everything I try, no matter how simple, just doesn’t work.

Hibernate has 150 KLOC (150,000 lines of code) and you still need an extra “connection pool software” (to solve the problem of reconnecting to the database, and also for performance reasons apparently). The recommended one seems to be C3P0 which is another 50 KLOC (and what a useless name!). You can imagine how many bugs such a piece of software must have, due to its size. I can only imagine all frameworks like e.g. “Gorm uses Spring uses Hibernate uses C3P0 uses database client uses database”. Then your software has the union of the set of bugs which each layer contains – In addition to your own bugs.

Just now I’ve looked on a live server, which is under no load, and looking at the access log there were only been a few web requests at 9am, yet the web server log shows:

2009-06-02 01:53:41,889 DEBUG:
com.mchange.v2.c3p0.impl.NewPooledConnection:
com.mchange.v2.c3p0.impl.NewPooledConnection@bb21ab closed by a client.
java.lang.Exception: DEBUG -- CLOSE BY CLIENT STACK TRACE
at com.mchange.v2.c3p0.impl.NewPooledConnection.close
   (NewPooledConnection.java:491)

Nothing like “shouting” (all caps) in an Exception message to inspire confidence. Or using the word “debug” when throwing an Exception. Or for that matter throwing the baseclass Exception and not creating a specific Exception subclass.

Incidentally this 2009-06-02 log was written to the 2006-06-01 logfile, and there were no web requests at any point on 2006-06-01. (Not sure if Jetty or log4j is at fault there.)

I need to consider alternative technologies: basically anything I deploy with this stack just isn’t going to work…

Mistake in the Javadoc for “Iterable”

Thursday, May 31st, 2012

Java has the keyword “for” to do “for each” semantics, which I always thought was strange. I mean

for ( ... ; ... ; ... )  { .. }  # standard C stuff
for (Object x : list) { ... }    # new "for each" syntax

I mean they are two different things, most languages use the keyword “foreach” for the second. I think “foreach” is more readable. I don’t see what Java stands to gain by “overloading” the “for” keyword.

Anyway, the second form takes either an array or an “iterable”. Check out the documentation for the “iterable” interface. It is succinct, but wrong.

http://docs.oracle.com/javase/7/docs/api/java/lang/Iterable.html

I conclude from this that they weren’t quite sure themselves how to name the keyword, and that they made their decision quite late in the day.

IDE reverse auto-complete

Sunday, April 8th, 2012

It just occurred to me that the following might be possible, and not only is it possible, Eclipse already does it.

Normally IDE auto-complete works forwards, i.e. you declare your variables, and then when you come to use them, they are suggested.

But it can/does also work the other way around. You use a variable which is undefined, then place your cursor above the usage and start to declare it, and it fills in name of the variable you want to define for you.