Archive for the ‘uboot’ Category

NoSQL becuase SQL is too slow?

Monday, March 8th, 2010

I just read this excellent article relating to the reasoning for using NoSQL databases being performance problems.

http://thoughts.j-davis.com/2010/03/07/scalability-and-the-relational-model/

However, I wonder if it’s even reasonable to search for solutions to performance problems with the relational model; NoSQL, etc.

I think as computers get faster and faster (CPUs, SSDs, more memory, …) the set of problems which are “too slow” for a particular technology (or mindset) get fewer and fewer.

I used to have the pleasure of having to optimize systems (based on SQL, some of our solutions involved leaving the relational model); e.g. community with 4M users (uboot.com), but nowadays I have no customer where even an open-source database installed on reasonably inexpensive hardware is insufficient.

Of course, that’s just my experience, and there certainly are many problems, companies, etc. today which require solving performance problems in the database, but I assert a lot of the people proposing NoSQL as a solution to performance problems with SQL databases don’t have the performance problems in the first place; I mean not everyone is implementing Facebook, Google, …

We took uboot.com online 10 years ago

Sunday, February 21st, 2010

At approx 6:30am on Monday 21st Feb 2000 my boss, my colleague and I took the first version of Uboot online.

It certainly didn’t have all the features it needed back then, it took us one extra week to add an address book to the messaging functionality, for example. And it didn’t have any photo sharing, video sharing, blogging, or the e-commerce functions that it has now.

It’s actually amazing how fast we did develop the first version: a team of three software developers, I started at the company on 3rd Jan 2000 and completed two other projects before starting Uboot; the first commit was on 19th Jan 2000.

With a few exceptions, I have been involved with Uboot software development, sometimes more, sometimes less, since then.

The Internet: It’ll get slower before it gets faster

Thursday, January 31st, 2008

For the 3 weeks I was in the UK recently I used a UMTS modem (i.e. like a 3G phone) to surf the web and do all my work. Going round to my friend Robin’s house, who also works in IT, he does all his surfing through a cable from his phone to his computer: i.e. also UMTS.

At least in the UK, this is extremely popular. Also in Asia it makes a lot of sense; they have excellent high-speed mobile phone networks there and all ones preconceptions about the Asians having the latest handset devices: I can confirm first-hand that they’re all true.

As we all know and have been experiencing since about 2000, more and more phones are going to get more powerful and have larger screens. Full browsers will (and do) run on them. They will also be UMTS devices.

And for those people who don’t surf via UMTS, nearly everyone I know surfs at home using WLAN. A lot of offices use WLAN too. And obviously all the surfing at airports, coffee houses, hotels, conferences etc. all goes on via WLAN.

UMTS and WLAN have high bandwidth, but they have extremely high latency compared with a cable connection. That means that although the bytes flow fast once they’ve started, it takes a long time for the first byte to arrive.

I am quite proud of the fact that when I designed the “Uboot Joe” software (Windows software which ran on the user’s PC, sat on the notification area by the clock, and communicated with Uboot) I took this into account. Every action you do with the Joe is at most one client-server round trip. For example to view all the thumbnails in a folder, there is a single request from Joe to the server like “get all data in folder_id” and the return structure is a) information about the folder, b) information on all the photos within the folder, and c) all the binary JPEG data for the thumbnails of all those images. You can try using the Uboot Joe on a UMTS link, and it works faster than any website.

Contrast this design with HTML. The first response from the server contains <img src=xx> tags and only once that has been received can the browser make the further requests necessary to retrieve the images. If the first bytes of every response take a long time to arrive, then the user experiences that “long time” twice before they see the data they requested; first to get the HTML page then second to get the images.

In fact it’s worse. If a page has 50 embedded images, it doesn’t open up 50 concurrent connections to the server (for good reason). Instead it opens e.g. 4 connections. Which means that e.g. image number 5 has to wait for the “long time” of fetching image 1 to complete. (Some sites try to get around this by having lots of servers with different names e.g. img341.domain.com and distributing the images over these servers.)

And it’s even worse than that. Even if the application only does one round-trip to the server, the underlying protocols might do more round-trips, for example firstly to contact the DNS server to get the IP address for the domain name used in the URL; and then secondly to request the data from the server.

In addition to this being a problem with UMTS and WLAN, one also has to take into account that the Internet is global. When I’m in Macau accessing European servers I get a round trip of about 300ms. So if one adds three “long times” to an otherwise extremely fast request—easily done—one has added a whole second on to the time the user has to wait. And Jakob Nielsen says that after 1 second in total, users start to lose focus on what they’re doing.

So to design applications in this age, one needs to be aware of the number of serial server round-trips (i.e. the number of times you need to ask the server for something, and only once it’s been delivered, must you ask the server for something else).

For example:

  1. An HTML page which contains an external CSS file, and this CSS file contains URLs to images.
  2. Pages with many images. The browser only requests a few files from the same server at once, so again the response to image number 1 must be finished before the request for image number 3 can begin.
  3. Javascript software which does multiple serial calls to the server, e.g. “get session token for username/password” then “get info to display on page for session token”.
  4. A form which submits data to a piece of software. The software does something but instead of returning a result page, returns a redirection command to a “real” result page. Often done to allow one to hit “refresh” safely on the result page, or make the URL of the result page look nicer.

GWT is excellent in this regard. It has the ability to download lots of those small icon-size images in one request (it makes one big image on the server and chops them up again on the client) and it makes you explicitly aware of the number of server round trips by forcing you to define interfaces for client-server interactions – as opposed to some automated scheme where you write code and the framework decides when to insert client-server round-trips. (Wicket makes client-server round trips easy with AjaxLink; my fear is it might be too easy, and one might do them too often, and lose the overview of how many are happening).

Pre-caching is a good idea too. E.g. if you are a photo viewer application, with a photo shown full screen with a “next” button, it makes sense to load the image on the “next” page even before the user’s clicked on it. That download won’t interfere with the rest of the activity the user is doing, as the bandwidth is not the bottleneck, just the time between starting the download and the bytes starting to arrive at the client. (Although one can’t download too much without the user noticing, as some people pay per MB!)

But the most important point, I think, is: these days, one must test ones web applications on a high latency connection. Generally speaking, historically I have tended to develop locally (everything installed on my laptop), or I develop in an office with a network cable and high-speed Internet and a link to the data center where the test server sits—and the office is in the same country as the server. Maybe this sounds strange, but I think one should develop web applications while using a UMTS card.

New Year, New Blog

Tuesday, January 8th, 2008

I shall be blogging here henceforth. I have moved all my old articles over from my previous uboot.com blog.

The reason for the move was multiple:

  • It’s important to use the software you write, to experience its successes and limitations. I am a contractor for uboot and have been using their blog; however I am also a contractor for easyname and in December we took the hosting features online we’d been developing in 2006. I’m glad to report they work just great!
  • I wanted more control over the design. The text was small at the uboot blog and didn’t invite reading.
  • I have discovered hierarchical categories! So one can view just my software design posts for example. They are cool. Did uboot have them? If so, I never found them.
  • I wanted a facility for seeing new comments without checking all posts over all pages, and comparing the current number of comments on that post with my memory of the previous number of comments.
  • Trackback wasn’t fully working with uboot. Although I suppose in the time it took me to set up my own blog, I could have repaired the uboot feature!

It wasn’t easy easy as I had hoped to set up this blog. I imagined just FTPing over a Wordpress installation and that would be it. While I don’t want to sound ungrateful to the open-source programmers who made both the blogging software, and the hosting software possible, there are a sufficient number of small problems – both in implementation and in architecture – with the internet, web servers, web protocols, and in every piece of software, as to make the seemingly simple process of installing some blogging software annoying and time-consuming. I am writing this at the end of the second day of full-time work creating this blog.

  • My plan to import my own data was to import the RSS feed from the old Blog. However, that RSS import software had three separate bugs. I have fixed these problems now in the source, and will submit them in due course.
    • The RSS importer didn’t use an XML parser, but instead regular expressions. Thus it required an <item> tag to look exactly like “<item>”; whereas the uboot RSS feed includes attributes of the item tag, i.e. <item x=”y”>. So it didn’t match and simply reported that it had “successfully” imported 0 posts.
    • Newlines in the HTML content were turned into <br> characters. My HTML content had a lot of newlines (that’s how the gmail WYSIWYG editor produces the content). These are ignored by the browser, so shouldn’t be turned into <br>s which are not ignored by the browser. I solved the problem by replacing all newlines in the HTML with spaces, before the <br> conversion happened.
    • HTML escaping was being performed needlessly on the article titles. The titles were already in HTML in the RSS file. So “&quot;” text was introduced into the user-visible titles. I know the RSS feed is correct in this regard as it renders correctly on Google Reader, Bloglines, etc. I have removed this conversion.
  • Alas I lacked sufficient knowledge of CSS so getting the style correct was a great pain. Yet I didn’t have particularly ambitious style requirements, as any viewer of the new blog can confirm.
    • One problem that took ages was the removal of some two-coloured vertical lines. Were they images? Clever CSS borders? I couldn’t find the border commands in the CSS file but also couldn’t find any image commands. Nor were they referenced from the HTML file. Finally I checked the images directory and found an image; then full-text searched all files. I found the image referenced in a <style> element at the <head> of one of the HTML files.
    • Various IE7 problems. I even had to insert a “if browser=IE” Javascript in one place.
  • All embedded image references and intra-blog links had to be changed. (I couldn’t even just leave them pointing to the old blog, as they were relative links, i.e. <img src=”/x/y.jpg”> so didn’t work after the new content had been imported.)
  • I created a “.htaccess” file to password-protect the website while it was under development. Later I deleted the files. However Wordpress had written some rules into the file (without it being obvious to me) so that URLs like “/post-name” would be mapped to the correct PHP files. So after I deleted the “.htaccess” file to give everyone access, the blog no longer worked (it took me some time before I discovered that, as the URL “/” still worked; so it was not obvious which action had led to the pages stopping working)
    • Let us not forget that the syntax of “.htaccess” and “.htpasswd” files is far from obvious in the first place (But thankfully my hoster has a tool to write this files – actually I wrote that piece of software!)
  • I tested the RSS feeds from the new blog in Google Reader just at the moment when the .htaccess file was broken. Thus, Google Reader cached an non-working version of the page with 0 posts. And as Google Reader shares that cache between its users, I knew that anyone trying to subscribe to the feed would see the same thing. It’s fixed now though (by time).
  • I’m sure there were more problems but I can’t remember. I should have written them down as I was working; after all, the probability of me not writing a blog entry about the difficulties of installing the new blog software were clearly not particularly high.

So essentially had I not been deeply familiar with PHP, HTML, Javascript, .htaccess files, FTP, XML and (to an extent) CSS, I would not have made it. This is not something I would recommend for “Aunt Tillie”.

Anyway, now it’s done, and as I now maintain this software in contrast to before, I look forward to also having to fix it when it breaks randomly in the future (as inevitably software always does).

Making progress with introduction of unit tests to Uboot

Monday, May 14th, 2007

The old uboot code had, amazingly enough, 21k lines of unit tests. But they were not useful unit tests, as one had to run each program individually, and they each had a bunch of (different) prerequisites, such as account_id 3 existing and having an empty inbox, and so on. And with the older tests, their output would be a bunch of print statements (e.g. insert message; print count of messages), and one would have to compare the printed output with the expected results (which weren’t documented anywhere).

I am converting them to PerlUnit (which is a clone of JUnit) so that we can automatically and easily run as many tests as possible before each release. This is an incredibly productive task, as I don’t even need to write new unit tests (and think about testing strategy), I’m just converting the lines to a format enabling them to be convenient to run!

So far 3.6k lines in 86 test functions in 33 test classes :)

$ ./test.pl
...................................................
...................................
Time: 48 wallclock secs ( 8.65 usr  0.56 sys +  0.02 cusr  0.25 csys =  9.48 CPU)

OK (86 tests)

Transfering some hex. Sometimes gets replaced by string "INF". Why?

Thursday, May 10th, 2007

This was never going to work out. Data transfer interface. Our side in Perl and their side in PHP. Both scripting languages (bad) and not even the same scripting language (incompatible badness).

Over the data transfer interface, we are transferring users. Including a code to enable them to unsubscribe from an email newsletter. The first 7 characters of the code identify the users (digits) and the rest of the code is a hex string containing some security information.

All works great. But some users can’t use the code? It turns out on the destination system they have “INF” in the field instead of the code.

It turns out that some of these users have e.g. 1234567 to identify the user, and e.g. 123e1234567 as their hex code. That makes the security code “1234567123e1234567″. And that “looks like” a floating point number to Perl. But quite a big one. Almost as big as Infinity in fact, so might as well call it that.

I hardly think the flexibility we “won” through every data instance having its own type based on what its data “looks like” hardly compensates the anger of a segment of our users not being able to unsubscribe from their newsletter, or the extra expense to the company of the time to debug this problem (which was then an urgent problem, as it was only discovered after the system went live, as it only affected 0.6% of our users).

P.S. my solution was to put a space in front of the code, which is taken off by the receiving system, so the data always “looks like” a string. But I wouldn’t like to guarantee that what “looks like” a string won’t change with the next version of the Perl SOAP client libraries we are using.

Fast

Wednesday, March 28th, 2007

From our Oracle test instance:

1360965 rows created.
Elapsed: 00:01:17.90

That's 1¼ minutes to insert (and index) over 1¼ million rows.

And this is a very old test instance. I think the hardware was last updated 2-3 years ago.

That's pretty quick!

bugs

Monday, April 3rd, 2006

well really a lot of things were far from optimal about the software. lots of bugs but a lot of things which were integration troubles, i.e. one bit of software worked 95% and another software worked 95% and together they worked 0%. today and yesterday sat with smo and went through a whole bunch of software from a whole bunch of people and just hacked away until it worked. now there are last 5 galleries etc on the start page which is quite cool.

new galleries is not as cool as it could be, as they are "checked", i.e. when the customer care agents go home then there is no new checked content. but the newest blogs are interesting as they are not checked.

surprisingly people tend to use the new blogs much as the old galleries, i.e. just lots and lots of photos. surely the gallery is the more appropriate forum for such content. but hey, if the users like it, that's all i care about.