Business 2013

This is a report of business activities in 2013. (Similar to the report for last year, 2012).

My employee MartinL (software development) has remained with me for most of 2013, although returning to university in the latter half of the year. New employee DavidZ (software development) has joined.

I have been getting into outsourcing for a few years. Not for the obvious reason of reducing customer costs (that’s just a side-effect), but for the fact that good people in Vienna are often already engaged. I don’t want to have to tell my customers “We can do the project, but only in 6 months..” so I’ve had to look for other alternatives. We’ve had some good and bad experiences (as it to be expected), but now I have at least one person who delivers excellent quality and is very reliable.

I have diversified away from just software development into doing project management, requirements, photoshop and html, javascript. This has been made possible by my extended team.

I have also started to do basic websites, based on WordPress. This is not the most lucrative line of work, and I’m not sure if it’s a good direction to go in, but it seems to be what the market demands. And I don’t like to say “no” to customers :-)

A sadder note in 2013 is that I’ve started to use KSV (debt collection agency) to collect money from some of my customers. It seems there was a lot of unwillingness to pay by some of my customers (not all!). Henceforth I shall be taking a harder line; any customer with any debt will not be able to order new features at all (even if they are very persuasive salespeople).

We have taken the following software online for our customers 2013:

  • firstbird has been architected and programmed by myself and the team. This was a major undertaking, over 1.5k hours of effort. Companies can post job openings, Employees of the company can recommend people for the jobs. Companies can rate and hire candidates. Employees are kept in the loop all the time via feeds and various email notifications. Employees get points, monetary rewards and badges for their efforts. Information posted and captured from Facebook, XING, LinkedIn and Google Contacts. PDF CVs automatically generated from XING and LinkedIn information, etc.
  • mobilreport imports mobile phone bills and analyzes them. We have 40+ paying customers; big companies in Austria and Germany with large phone bills covering up to a few thousand employee mobile phones. We take the call log data from the mobile provider and parse them. Often these are very nasty formats, e.g. CSV files where certain cells contain unescaped “,” characters, i.e. parseable only by heuristics. The operators have no desire that you parse their data, apply learnings from it to save yourself money. So they’re not keen to fix such bugs. We now support the following formats: A1 AT (EGN and 2 Landline formats), COLT AT Landline, Telering AT, T-Mobile AT, T-Mobile DE (CSV and EDIFACT), Vodafone DE (EDIFACT).
  • For one major customer in Austria we extended mobilreport with the facility to calculate employee limits (i.e. employee gets a phone from the company, may use up to €20 per month, the rest gets deducted from their salary; bosses may use up to €100, various calls don’t count). They are informed about their limit exceedance via SMS.
  • mobilreport now has the facility for employees to log in and see the calls they’ve done with their company mobile phones. SSO from the company’s intranet to our system means the employee doesn’t have any extra passwords to remember.
  • Demonstration for a major Austrian vendor of a “command and control” center for police, with various projectors, controlled via iPads. A “game” was described in XML (firstly this scene, then if the user selects “send car” then this scene..), this is parsed on the iPad, commands sent via XML over HTTP to VLC instances connected to the projectors. Used web-technologies; written in GWT by MartinL and myself.
  • Extended a PHP system, back-end for an iPhone application for authorization and location tracking, with various pieces of new functionality (DavidZ)
  • Matchmatrix event site (alas the event was a failure so website was taken offline and I was never paid; case is with debt collection at the moment.)
  • Calendargrads website was implemented in WordPress for under €500 (logo was not done by us)

Jetty doesn’t show errors on web application start-up

From a certain version of the “jetty” package in Debian Linux, if the web application didn’t start up (servlet init() throws an Exception), this error wasn’t logged anywhere. The solution is to install the libjetty-extra package.

sudo apt-get install libjetty-extra

It took some amount of experimentation to find the solution. I don’t know why you’d ever want to not log errors; i.e. why the logging of errors is an “extra”.

Jetty is a Java web server, similar to Tomcat, it’s my server of choice. It’s simple, doesn’t seem to do much apart from run WARs, and (apart from this issue) I’ve rarely had any problems with it.

Wrapping IDs in objects

In Java, I like to wrap ID values in objects, rather than just passing them around the code as their native “int” or “long” or “String” values.

The reasons are twofold why using an object is better:

  • Code becomes more readable, for example foo(LoginId x) is more readable than foo(long x). (Although perhaps neither foo nor x are good names, so perhaps the example over-exaggerates this improvement.)
  • The compiler can do more checking. If you pass a job advert’s id (as a long) to a function expecting a login id (as a long), the compiler cannot warn you of your mistake. This becomes particularly relevant if you have a function taking two IDs and you pass them the wrong way around.

Things to consider when writing such an “ID object”

  • Do not allow the ID contained within the object to be null. Having “LoginId x” where x is not null, but the value contained within x is null, makes no sense. (For example, use primitive types if dealing with numerical IDs, as they cannot be null.)
  • If the ID is a string, don’t allow this string to be empty; same as above.
  • Implement equals and hashCode methods so that these IDs can be used within Sets, or as keys within Maps, or as keys in Wicket drop-downs, or wherever.
  • Make them serializable.
  • Make the ID attribute a “public final” attribute. That means useless getter methods can be avoided, from the object itself and from client code.
  • The object should have a single constructor which takes the value and sets it in the attribute.
  • Implement toString so that the debugger can display the object usefully.

“New Game!!!”

I am writing this blog post in October 2012; this will get published automatically in one year, like government secrets published 50 years after the event. But this just cannot not be shared.

One of my flatmates has just written an email to all:

Subject: New Game!!!

It’s time for a little Game called… … …  “Clean your dishes and live in peace” !!

Few assertions:

  • Human being has to live in an hygenic place
  • Everyone is respectful
  • Time to:  putting your plates, glasses, cutleries in the dishwasher + clean by yourself pan, saucepan(Yes, because as you know hot water is expensive and Earth ask us to do not waste it)  =  takes no more than 5 minutes
  • A clean kitchen is an area where everyone can go and enjoy it (because currently it’s not really welcoming)
  • Live in a sharing flat is also comply with basics rules

If anyone is not interested about this “game” we are all enough mature to talk about it.

Also, if anyone has a request to do : It’s time to do it !!

The person who was implicitly addressed in this email wrote back, also to all:

Hey. We kind of established a “policy” that its ok to let dirty dishes lie around for 24 or even 48h at certain occasions. At least no one ever had a problem with that, as long as its not getting to worse.. I personally think that maximum 1 or occasionly maybe a few days is a good timefraime to have and would not like to be forced to always clean everything right after eating. Sure we are mature enough to dicuss this.

How happy do you reckon the original author is going to be with this reply??

Made a mistake? Have a pay rise!

The following two viewpoints are not consistent:

  1. “Come and work at our company. We are looking for someone with lots of experience. The more experience you have the more you’ll get paid!”
  2. “You made a mistake. You’re fired!”

You can pick up knowledge at university and through reading books. But companies still want experience in addition to knowledge. For me, experience is very much having seen what works, and what doesn’t work, by having done things that work, and done things that don’t work.

So in my opinion, if you’re going to pay someone joining a company a higher salary if they’re more experienced, if your existing team member walks in to your office and states “sorry I just made a mistake, my bug wiped the production database”, you should reply, “congratulations, this is one mistake you won’t make again, now you are more experienced, have a pay rise”.

For those who don’t know me, it’s worth pointing out that the above is not intended to be 100% serious. Nevertheless, I think the principle stands, that, assuming you’re basically good at what you do, making a mistake increases your experience and thus generally increases your value to your employer.

Sunday was quite a day

On Friday I learned that I must create a website for a customer, and that P.R. was going out about the website’s existence Sunday evening. On Saturday I was out all day at a wedding. So Sunday was going to be the day when it was both going to be started, and had to be completed.

Amazingly, my colleague Mae and I did the website in the day. Mae designed the entire HTML from scratch. I found out how to integrate with PayPal, and implemented the API during that day: including using the IPN notification API to register successful payment and decrease the number of tickets that were available (limited guest-list to an event.)

It’s a website where you can sign up to the MatchMatrix event; it’s an dating event taking place in a Vienna club in August, limited to 250 places for men and women.

Let’s see if the event is successful, but Elmar Rassi, the event co-organizer, has over 100k likes on Facebook, so must be pretty famous. [Update 29 Jul: No tickets were sold; website is offline at the request of the customer]

Mae is one of my remote workers from the Philippines who I found over the internet. I like to work with remote workers, as it increases my capacity, i.e. I can deliver more work to my customers that way. I would increase my team size in Vienna, but people who are good are already employed.

Although my and my team’s speciality is software development for websites, every now and again I get a request to just make “a website”, without a lot of software. I’m happy to do these projects as well. Although, ideally not on quite such short notice.

If you’re going to change databases, do it in one go

At Uboot, there was the suggestion that we should change database technology. This was because we would save money with by moving away from Oracle to e.g. MySQL.

It was further decided, as we had 200k+ lines of code, and 100+ tables, that if a migration was to occur, it shouldn’t happen  “all at once”, but instead one would have some parts of the system migrated with other parts of the system not yet migrated.

But the strategy of migrating databases piece by piece has the following problems:

  1. Data consistency. Initially, all important data are within the source database. It is possible to guarantee certain consistency constraints (e.g. all photos are owned by existing users). However, if e.g. photos are migrated to MySQL, while users remain in Oracle, no such consistency can be guaranteed. Inconsistency has the consequence that software must accept not only valid data, but also invalid data. It is impossible to test the software for all combinations of invalid data, and with the rule that “software which is not tested does not work”, one can deduce that the software will then not work. The set of inconsistencies will increase over time, so the set of users who experience bugs will also increase over time. Afterwards, going from an inconsistent state to a consistent state is nearly impossible.
  2. Reporting over multiple system areas. With one database, one can ask a question such as “how many photos belong to users logged in in the last month”. A “SQL join” is used. However if the tables reside in different database, no such query can be asked. One would have to export the data from one database to another, in order to ask the query. Or export all data into a centralized data warehouse. Or write software to do what a single line of “join” would do. All this effort is saved, by using a single database.
  3. Point-in-time recovery. With one system, if the database crashes or an error is made (e.g. “drop table account” is executed by an employee), a “point-in-time recovery” can be performed. Data is restored to its consistent state as it was at a certain point in time. If one uses more than one database, a point-in-time recovery will be more difficult. For example if the databases are out-of-sync by 1 second, then some photos will exist for which there are no users, or vice versa, and data consistency problems exist as described in point 2.
  4. Spending money on migration; no saving in license fees. If a migration is done, time/money is spent. Not only in doing the migration, but fixing performance problems after the system is done. If money can be saved on the Oracle license, perhaps this time is worth it, however if one is using multiple databases then one is still paying for Oracle.

The solution is either to change databases in one go, or simply don’t change databases. You want to have your data in one place, at all times.

Minimal security is gained by using case-sensitive passwords

My colleague suggested that systems should compare passwords in a case-sensitive manner. He pointed out the larger space of possible passwords, and thus the longer passwords would take to crack by a brute-force attack. This is all standard knowledge.

I still maintain, however, as discussed before, that the correct trade-off between usability and security is to compare passwords in a case-insensitive manner.

The increase in usability in having case-insensitive passwords is obvious (and documented in the previous post), so to understand the usability/security trade-off, how much is security reduced by using them?

The following calculations assume the following:

  • The NSA, or whoever, wishes to attack you, has access to the hashed password (e.g. database leak)
  • They have a strong machine such as this one
  • They use a brute-force attack and try every combination (they don’t just use dictionary words)
  • Assuming you’re a person of interest, let’s say they are prepared to expend 2 months running the machine at full power just to crack your password
  • A case-sensitive password has 80 possible characters to choose from (upper-case, lower-case, numbers, symbols), a case-insensitive normalized password as described in my previous blog post has 54 possible characters.

So how long would your password have to be to defeat the above attacker, with case-sensitive and case-insensitive passwords?

SHA1 Passwords

The machine above can try 63 billion passwords per second. That means, in the 2 months available, it can try out 3.4×1017 passwords.

  • A case-sensitive password with length 10 has 10.7×1017 possibilities, so cannot be cracked
  • A case-insensitive password with length 11 has 11.3×1017 possibilities, so cannot be cracked

So the “lack of security” imposed by case-insensitivity can be mitigated by having a single extra character in your password, or, put another way, making your password 10% longer.

To those who would argue that it’s likely people will use random 10 character passwords but not random 11 character passwords: I propose that there are those of us who will generate n character passwords using a tool (the site is free to suggest n, meaning its value doesn’t matter), and there are those who would use their pet’s name as their password, in which case even a case-sensitive password is insecure, meaning case-sensitivity doesn’t matter either.

bcrypt(10)

But let’s try a more realistic example. Who uses SHA1? We use bcrypt, like presumably everyone else.

bcrypt has a strength parameter. It re-hashes the password 2n times. So each time you increase the strength parameter by one, it takes twice as long to calculate. By default this strenth parameter is 10, which is fine for us: it takes our server 0.1 seconds to calculate such a hash.

The web-page says that monster machine can do 71k bcrypt(5) passwords per second. So that means it can calculate 2.2k bcrypt(10) passwords per second. Meaning in the two months, it can calculate 1.1×1010 passwords. So that means:

  • A case-sensitive password with length 6 has 26×1010 possibilities, so cannot be cracked
  • A case-insensitive password with length 6 has 2×1010 possibilities, so cannot be cracked

So we find out that with a normal hashing strategy, the password doesn’t have to be made longer to remain at the same level of security.

The “lack of security” imposed by case-insensitive passwords mean that the password either has to be slightly longer, or not longer at all. The usability advantages are very real. So that, in my mind, makes the usability vs security trade-off a clear win for case-insensitive passwords.

When and who should fix bugs?

There are (at least) the following options that a project manager must make when organizing bug fixes to software:

  • Should the original author of the code in question fix the bug? Or should there be a “bug team” who surgically go in and fix bugs?
  • Should one fix bugs as one goes along? Or concentrate on features and write the bugs down and fix them in a “bug sprint”?

I am very much in favour of the original author fixing bugs, and fixing bugs as soon as they occur. Because:

  • It’s difficult to predict how long it will take to fix a bug, e.g. estimates might be “between 5 minutes to 4 hours”. It’s easier to see how far through the project you are if you have 50% of the features working without bugs, than if you have 75% with an unknown number of bugs each taking an unknown length of time to fix.
  • If a programmer makes a mistake, it’s important that they learn it, so they won’t make it next time. That’s why the original author should fix the bug.
  • Do you put your best programmers or your worst programmers on bug fixing? Bug fixing is tricky, so if you put your worst programmers on bug fixing they’ll only make the situation worse. If you put your best programmers on bug fixing, they’ll all quit.
  • If one person is maintaining one piece of code, they feel “ownership” over it. This is perhaps the opposite of the desirable quality that everyone knows the code. Nevertheless, I think “ownership” is a powerful concept that causes people to take more care of their work than they otherwise would, leading to a better piece of software.
  • Assuming that bugs are fixed immediately, and you find a bug in an existing part of the system, you’ll report it and/or fix it. If bugs are left until the end, and you see a bug, you’ll just ignore it, as you know there are existing bugs. This might cause a new/different bug to not get reported.
  • If you fix something weeks later, you might have forgotten the original code, including weird important speical cases. Ideally everything would be perfectly documented and readable, but that often isn’t the case in reality. Having the special cases in your head will prevent your bug fix from actually being an introduction of further bugs.

One of my old bosses used to utter the phrase “you break it, you fix it!”. I never had good associations with that phrase, every time I heard it, new pieces of work were only seconds away from being assigned to me :-)

Don’t use constants for table and column names when writing SQL statements

I was always in two minds about using constants for table and column names when writing SELECT queries. Now I’ve concluded that constants are definitely bad, and should not be used. Here’s why.

The topic of discussion is difference is between writing

sql = "SELECT * FROM " + TABLE_NAME + " WHERE ..."

and

sql = "SELECT * FROM my_table WHERE ..."

There are the following consequences from this choice as far as I can see:

  • A compiler like Java can tell you if you misspell a variable name (e.g. tableName) but not if you misspell “myTable” in the middle of a string (Win for constants, if you’re using a static typing language)
  • You can rename the values easier: just change and rename the constant. But how often do just table names or columns get renamed? Normally when I change the database I am implementing a new feature, and everything has to be changed anyway. (Marginal win for constants)
  • The layout of the query is shorter and easier to read if constants are not used. (Win for not using constants)

To avoid having this choice in the first place:

  • One could use an ORM, but at least in Java, e.g. HQL in Hibernate still has a string for column names, and table names if you’re doing joins, so the problem is still there.
  • Using a system like LINQ in .NET which allows you to specify queries in a way the compiler understands, not just a string. (But can it do everything SQL can do including vendor-specific things?)
  • Being able to extend the language with other languages such as SQL and regular expressions. This is a fantasy of mine and a friend, hasn’t happened yet. This would work by the compiler working in conjunction with the database engine to assert that the query is valid at compile time (and possibly even creating an db-specific internal parsed representation right there and then.)

Compare the following two pieces of code and I think the choice will become obvious. Both pieces of code come from the current code-base I’m working on, neither have I written myself.

sb.append("SELECT ").append(RecruiterRole.TABLE_NAME).append(".*,");
sb.append(Login.TABLE_NAME).append(".*");
sb.append(" FROM ").append(RecruiterRole.TABLE_NAME);
sb.append(',').append(Login.TABLE_NAME);
sb.append(" WHERE ");
sb.append(RecruiterRole.COLUMN_COMPANY_ID).append(" = ?");
sb.append(" AND ");
sb.append(RecruiterRole.TABLE_NAME).append('.').
sb.append(RecruiterRole.COLUMN_LOGIN_ID).append('=?');

vs

sql.append(" SELECT * ");
sql.append(" FROM application");
sql.append(" WHERE job_advert_id IN (");
sql.append("   SELECT job_advert_id");
sql.append("   FROM share");
sql.append("   WHERE talent_scout_login_id = ?)");
sql.append(" AND potential_applicant_identity_id NOT IN (");
sql.append("   SELECT potential_applicant_identity_id");
sql.append("   FROM positive_endorsement");
sql.append("   WHERE talent_scout_login_id = ?)");
sql.append(" AND company_id = ?");
sql.append(" AND share_talent_scout_login_id = ?");
sql.append(" ORDER BY datetime_utc DESC");

Here are the reasons why the second code is more readable:

  • Because table/column names are inline, the code reads easier
  • Indenting is used for sub-selects
  • Each condition, order-by is on its own line (e.g. “AND company_id=?”)
  • Keywords uppercase, column and table names lowercase

The danger of the above code is less that errors in spelling will only be detected at run-time and not at compile-time, but that the query does the wrong thing (while appearing to do the right thing). For example, an error I saw recently (which obviously did not make the live system!) was that users could see data not only from themselves but from all users because the “WHERE login_id=?” had been forgotten. But to the untrained eye, or a user on the test system with only a few users, the query appeared to work.

In this case, it’s a clear win for readability, over compile-time checking of a mistake which will is unlikely to happen and will be identified at run-time.