<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Databases and Life</title>
	<atom:link href="http://www.databasesandlife.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.databasesandlife.com</link>
	<description>Adrian Smith's blog</description>
	<lastBuildDate>Tue, 18 Jun 2013 08:23:49 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Minimal security is gained by using case-sensitive passwords</title>
		<link>http://www.databasesandlife.com/minimal-security-is-gained-by-using-case-sensitive-passwords/</link>
		<comments>http://www.databasesandlife.com/minimal-security-is-gained-by-using-case-sensitive-passwords/#comments</comments>
		<pubDate>Tue, 18 Jun 2013 08:23:49 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1578</guid>
		<description><![CDATA[My colleague suggested that systems should compare passwords in a case-sensitive manner. He pointed out the larger space of possible passwords, and thus the longer passwords would take to crack by a brute-force attack. This is all standard knowledge. I still maintain, however, as discussed before, that the correct trade-off between usability and security is [...]]]></description>
				<content:encoded><![CDATA[<p>My colleague suggested that systems should compare passwords in a case-sensitive manner. He pointed out the larger space of possible passwords, and thus the longer passwords would take to crack by a brute-force attack. This is all standard knowledge.</p>
<p>I still maintain, however, as <a title="Strong Passwords: Education not enforcement" href="http://www.databasesandlife.com/strong-passwords-education-not-enforcement/">discussed before</a>, that the correct trade-off between usability and security is to compare passwords in a case-insensitive manner.</p>
<p><strong>The increase in usability in having case-insensitive passwords is obvious (and documented in the previous post), so to understand the usability/security trade-off, how much is security reduced by using them?</strong></p>
<p>The following calculations assume the following:</p>
<ul class="tight">
<li>The NSA, or whoever, wishes to attack you, has access to the hashed password (e.g. database leak)</li>
<li>They have a strong machine such as <a href="https://securityledger.com/2012/12/new-25-gpu-monster-devours-passwords-in-seconds/">this one</a></li>
<li>They use a brute-force attack and try every combination (they don&#8217;t just use dictionary words)</li>
<li>Assuming you&#8217;re a person of interest, let&#8217;s say they are prepared to expend 2 months running the machine at full power just to crack your password</li>
<li>A case-sensitive password has 80 possible characters to choose from (upper-case, lower-case, numbers, symbols), a case-insensitive normalized password as described in my previous blog post has 54 possible characters.</li>
</ul>
<p>So how long would your password have to be to defeat the above attacker, with case-sensitive and case-insensitive passwords?</p>
<h3>SHA1 Passwords</h3>
<p>The machine above can try 63 billion passwords per second. That means, in the 2 months available, it can try out 3.4×10<sup>17</sup> passwords.</p>
<ul class="tight">
<li>A <strong>case-sensitive</strong> password with <strong>length 10</strong> has 10.7×10<sup>17</sup> possibilities, so cannot be cracked</li>
<li>A <strong>case-insensitive</strong> password with <strong>length 11</strong> has 11.3×10<sup>17</sup> possibilities, so cannot be cracked</li>
</ul>
<p>So the &#8220;lack of security&#8221; imposed by case-insensitivity can be mitigated by having a single extra character in your password, or, put another way, making your password 10% longer.</p>
<p>To those who would argue that it&#8217;s likely people will use random 10 character passwords but not random 11 character passwords: I propose that there are those of us who will generate <em>n</em> character passwords using a tool (the site is free to suggest <em>n</em>, meaning its value doesn&#8217;t matter), and there are those who would use their pet&#8217;s name as their password, in which case even a case-sensitive password is insecure, meaning case-sensitivity doesn&#8217;t matter either.</p>
<h3>bcrypt(10)</h3>
<p>But let&#8217;s try a more realistic example. Who uses SHA1? We use <a href="http://codahale.com/how-to-safely-store-a-password/">bcrypt</a>, like presumably everyone else.</p>
<p>bcrypt has a strength parameter. It re-hashes the password 2<sup><em>n</em></sup> times. So each time you increase the strength parameter by one, it takes twice as long to calculate. By default this strenth parameter is 10, which is fine for us: it takes our server 0.1 seconds to calculate such a hash.</p>
<p>The web-page says that monster machine can do 71k bcrypt(5) passwords per second. So that means it can calculate 2.2k bcrypt(10) passwords per second. Meaning in the two months, it can calculate 1.1×10<sup>10</sup> passwords. So that means:</p>
<ul class="tight">
<li>A <strong>case-sensitive</strong> password with <strong>length 6</strong> has 26×10<sup>10</sup> possibilities, so cannot be cracked</li>
<li>A <strong>case-insensitive</strong> password with <strong>length 6</strong> has 2×10<sup>10</sup> possibilities, so cannot be cracked</li>
</ul>
<p>So we find out that with a normal hashing strategy, the password doesn&#8217;t have to be made longer to remain at the same level of security.</p>
<p>The &#8220;lack of security&#8221; imposed by case-insensitive passwords mean that the password either has to be slightly longer, or not longer at all. The usability advantages are very real. So that, in my mind, makes the usability vs security trade-off a clear win for case-insensitive passwords.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/minimal-security-is-gained-by-using-case-sensitive-passwords/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>When and who should fix bugs?</title>
		<link>http://www.databasesandlife.com/when-and-who-should-fix-bugs/</link>
		<comments>http://www.databasesandlife.com/when-and-who-should-fix-bugs/#comments</comments>
		<pubDate>Mon, 17 Jun 2013 07:56:49 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1591</guid>
		<description><![CDATA[There are (at least) the following options that a project manager must make when organizing bug fixes to software: Should the original author of the code in question fix the bug? Or should there be a &#8220;bug team&#8221; who surgically go in and fix bugs? Should one fix bugs as one goes along? Or concentrate [...]]]></description>
				<content:encoded><![CDATA[<p>There are (at least) the following options that a project manager must make when organizing bug fixes to software:</p>
<ul>
<li>Should the original author of the code in question fix the bug? Or should there be a &#8220;bug team&#8221; who surgically go in and fix bugs?</li>
<li>Should one fix bugs as one goes along? Or concentrate on features and write the bugs down and fix them in a &#8220;bug sprint&#8221;?</li>
</ul>
<p>I am very much in favour of the <strong>original author fixing bugs</strong>, and <strong>fixing bugs as soon as they occur</strong>. Because:</p>
<ul>
<li>It&#8217;s difficult to predict how long it will take to fix a bug, e.g. estimates might be &#8220;between 5 minutes to 4 hours&#8221;. It&#8217;s easier to see how far through the project you are if you have 50% of the features working without bugs, than if you have 75% with an unknown number of bugs each taking an unknown length of time to fix.</li>
<li>If a programmer makes a mistake, it&#8217;s important that they learn it, so they won&#8217;t make it next time. That&#8217;s why the original author should fix the bug.</li>
<li>Do you put your best programmers or your worst programmers on bug fixing? Bug fixing is tricky, so if you put your worst programmers on bug fixing they&#8217;ll only make the situation worse. If you put your best programmers on bug fixing, they&#8217;ll all quit.</li>
<li>If one person is maintaining one piece of code, they feel &#8220;ownership&#8221; over it. This is perhaps the opposite of the desirable quality that everyone knows the code. Nevertheless, I think &#8220;ownership&#8221; is a powerful concept that causes people to take more care of their work than they otherwise would, leading to a better piece of software.</li>
<li>Assuming that bugs are fixed immediately, and you find a bug in an existing part of the system, you&#8217;ll report it and/or fix it. If bugs are left until the end, and you see a bug, you&#8217;ll just ignore it, as you know there are existing bugs. This might cause a new/different bug to not get reported.</li>
<li>If you fix something weeks later, you might have forgotten the original code, including weird important speical cases. Ideally everything would be perfectly documented and readable, but that often isn&#8217;t the case in reality. Having the special cases in your head will prevent your bug fix from actually being an introduction of further bugs.</li>
</ul>
<p>One of my old bosses used to utter the phrase &#8220;you break it, you fix it!&#8221;. I never had good associations with that phrase, every time I heard it, new pieces of work were only seconds away from being assigned to me :-)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/when-and-who-should-fix-bugs/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t use constants for table and column names when writing SQL statements</title>
		<link>http://www.databasesandlife.com/dont-use-constants-for-table-and-column-names-when-writing-sql-statements/</link>
		<comments>http://www.databasesandlife.com/dont-use-constants-for-table-and-column-names-when-writing-sql-statements/#comments</comments>
		<pubDate>Fri, 14 Jun 2013 10:25:17 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Coding]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1581</guid>
		<description><![CDATA[I was always in two minds about using constants for table and column names when writing SELECT queries. Now I&#8217;ve concluded that constants are definitely bad, and should not be used. Here&#8217;s why. The topic of discussion is difference is between writing sql = "SELECT * FROM " + TABLE_NAME + " WHERE ..." and [...]]]></description>
				<content:encoded><![CDATA[<p><strong>I was always in two minds about using constants for table and column names when writing SELECT queries. Now I&#8217;ve concluded that constants are definitely bad, and should not be used. Here&#8217;s why.</strong></p>
<p>The topic of discussion is difference is between writing</p>
<pre>sql = "SELECT * FROM " + TABLE_NAME + " WHERE ..."</pre>
<p>and</p>
<pre>sql = "SELECT * FROM my_table WHERE ..."</pre>
<p>There are the following consequences from this choice as far as I can see:</p>
<ul class="tight">
<li>A compiler like Java can tell you if you misspell a variable name (e.g. tableName) but not if you misspell &#8220;myTable&#8221; in the middle of a string (Win for constants, if you&#8217;re using a static typing language)</li>
<li>You can rename the values easier: just change and rename the constant. But how often do just table names or columns get renamed? Normally when I change the database I am implementing a new feature, and everything has to be changed anyway. (Marginal win for constants)</li>
<li>The layout of the query is shorter and easier to read if constants are not used. (Win for not using constants)</li>
</ul>
<p>To avoid having this choice in the first place:</p>
<ul class="tight">
<li>One could use an ORM, but at least in Java, e.g. HQL in Hibernate still has a string for column names, and table names if you&#8217;re doing joins, so the problem is still there.</li>
<li>Using a system like <a href="http://msdn.microsoft.com/en-us/library/vstudio/bb397900.aspx">LINQ</a> in .NET which allows you to specify queries in a way the compiler understands, not just a string. (But can it do everything SQL can do including vendor-specific things?)</li>
<li>Being able to extend the language with other languages such as SQL and regular expressions. This is a <a href="http://parsinglife.blogspot.co.at/2011/10/regular-expressions-should-be-handled.html">fantasy</a> of mine and a friend, hasn&#8217;t happened yet. This would work by the compiler working in conjunction with the database engine to assert that the query is valid at compile time (and possibly even creating an db-specific internal parsed representation right there and then.)</li>
</ul>
<p>Compare the following two pieces of code and I think the choice will become obvious. Both pieces of code come from the current code-base I&#8217;m working on, neither have I written myself.</p>
<pre>sb.append("SELECT ").append(RecruiterRole.TABLE_NAME).append(".*,");
sb.append(Login.TABLE_NAME).append(".*");
sb.append(" FROM ").append(RecruiterRole.TABLE_NAME);
sb.append(',').append(Login.TABLE_NAME);
sb.append(" WHERE ");
sb.append(RecruiterRole.COLUMN_COMPANY_ID).append(" = ?");
sb.append(" AND ");
sb.append(RecruiterRole.TABLE_NAME).append('.').
sb.append(RecruiterRole.COLUMN_LOGIN_ID).append('=?');</pre>
<p>vs</p>
<pre>sql.append(" SELECT * ");
sql.append(" FROM application");
sql.append(" WHERE job_advert_id IN (");
sql.append("   SELECT job_advert_id");
sql.append("   FROM share");
sql.append("   WHERE talent_scout_login_id = ?)");
sql.append(" AND potential_applicant_identity_id NOT IN (");
sql.append("   SELECT potential_applicant_identity_id");
sql.append("   FROM positive_endorsement");
sql.append("   WHERE talent_scout_login_id = ?)");
sql.append(" AND company_id = ?");
sql.append(" AND share_talent_scout_login_id = ?");
sql.append(" ORDER BY datetime_utc DESC");</pre>
<p>Here are the reasons why the second code is more readable:</p>
<ul class="tight">
<li>Because table/column names are inline, the code reads easier</li>
<li>Indenting is used for sub-selects</li>
<li>Each condition, order-by is on its own line (e.g. &#8220;AND company_id=?&#8221;)</li>
<li>Keywords uppercase, column and table names lowercase</li>
</ul>
<p>The danger of the above code is less that errors in spelling will only be detected at run-time and not at compile-time, but that the query does the wrong thing (while appearing to do the right thing). For example, an error I saw recently (which obviously did not make the live system!) was that users could see data not only from themselves but from all users because the &#8220;WHERE login_id=?&#8221; had been forgotten. But to the untrained eye, or a user on the test system with only a few users, the query appeared to work.</p>
<p>In this case, it&#8217;s a clear win for readability, over compile-time checking of a mistake which will is unlikely to happen and will be identified at run-time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/dont-use-constants-for-table-and-column-names-when-writing-sql-statements/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Are Java enum values instances or classes?</title>
		<link>http://www.databasesandlife.com/java-enum-classes/</link>
		<comments>http://www.databasesandlife.com/java-enum-classes/#comments</comments>
		<pubDate>Sat, 13 Apr 2013 10:43:58 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1551</guid>
		<description><![CDATA[That depends on if the enum values provide methods which differ from one another. The following code produces just one class file, &#8220;Network.class&#8221;. &#8220;facebook&#8221; and &#8220;linkedIn&#8221; are just instances of the Network Java class. public enum Network { facebook, linkedIn; public void printName() { System.out.println(getName()); } } But the following code produces one class file [...]]]></description>
				<content:encoded><![CDATA[<p><strong>That depends on if the enum values provide methods which differ from one another.</strong></p>
<p>The following code produces just <strong>one class</strong> file, &#8220;Network.class&#8221;. &#8220;facebook&#8221; and &#8220;linkedIn&#8221; are just instances of the Network Java class.</p>
<pre>public enum Network {
  facebook, 
  linkedIn;

  public void printName() { System.out.println(getName()); }
}</pre>
<p>But the following code produces <strong>one class file for each value</strong>, named &#8220;Network$1.class&#8221; etc., as well as one class file for the abstract superclass, &#8220;Network.class&#8221;.</p>
<pre>public enum Network {
  facebook {
    public Client newClient() { return new FacebookClient(); }
  },
  linkedIn {
    public Client newClient() { return new LinkedInClient(); }
  };

  public abstract Client newClient();
}</pre>
<p>&#8220;facebook&#8221; and &#8220;linkedIn&#8221; are in fact different Java classes now.</p>
<p>Having a constructor taking parameters, and initializing each value of the enum by calling this constructor with values, is not sufficient to force the generation of individual classes per value.</p>
<p>Just because they are different classes in this situation doesn&#8217;t automatically mean you can do everything you&#8217;d expect to be able to do with a class. You can&#8217;t test for class membership using &#8220;instanceof&#8221; for example (not that this would be very useful for an enum, as there is only one instance of every value).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/java-enum-classes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL vs MySQL &#8220;update&#8221; difference</title>
		<link>http://www.databasesandlife.com/postgresql-vs-mysql-update-difference/</link>
		<comments>http://www.databasesandlife.com/postgresql-vs-mysql-update-difference/#comments</comments>
		<pubDate>Tue, 26 Mar 2013 08:21:02 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1534</guid>
		<description><![CDATA[While one session is doing a &#8220;delete then insert&#8221;, and another session doing an &#8220;update&#8221;, the update waits until the &#8220;delete then insert&#8221; is finished. But does the &#8220;update&#8221; then affect the &#8220;inserted&#8221; row or the &#8220;deleted&#8221; row? Create a table foo(id) and insert one row with id=5 into it. Then: Session A Session B [...]]]></description>
				<content:encoded><![CDATA[<p>While one session is doing a &#8220;delete then insert&#8221;, and another session doing an &#8220;update&#8221;, the update waits until the &#8220;delete then insert&#8221; is finished. <strong>But does the &#8220;update&#8221; then affect the &#8220;inserted&#8221; row or the &#8220;deleted&#8221; row?</strong></p>
<p>Create a table foo(id) and insert one row with id=5 into it. Then:</p>
<table class="hlines" style="width: 100%;">
<tbody>
<tr>
<td style="width: 50%;"><strong>Session A</strong></td>
<td style="width: 50%;"><strong>Session B</strong></td>
</tr>
<tr>
<td>START TRANSACTION</td>
<td>START TRANSACTION</td>
</tr>
<tr>
<td>DELETE FROM foo WHERE id=5</td>
<td></td>
</tr>
<tr>
<td></td>
<td>UPDATE foo SET id=6 WHERE id=5 <em>(blocks)</em></td>
</tr>
<tr>
<td>INSERT INTO foo VALUES (5)</td>
<td></td>
</tr>
<tr>
<td>COMMIT</td>
<td><em>(unblocks)</em></td>
</tr>
<tr>
<td></td>
<td>SELECT id FROM foo</td>
</tr>
</tbody>
</table>
<p>What does Session B see?</p>
<p>The result depends on your database vendor:</p>
<ul class="tight">
<li><strong>PostgreSQL:</strong> Row has id=5 (Session B&#8217;s update affected the deleted row)</li>
<li><strong>MySQL InnoDB:</strong> Row has id=6 (Session B&#8217;s update affected the inserted row)</li>
</ul>
<p>I don&#8217;t think I&#8217;d say either of these is the &#8220;right&#8221; approach, I think they are both valid. If two things happen at the same time, who&#8217;s to say which should win?</p>
<p>I would actually advise against deleting then inserting. I would <a title="“Just-in-time” inserting rows into a database" href="http://www.databasesandlife.com/jit-inserting-rows-into-a-db/">insert then, if a unique constraint violation is triggered, do an update</a>. But this doesn&#8217;t change the fact that, if you&#8217;re doing two things at the same time, you can&#8217;t attach meaning to which one should win.</p>
<p>BTW if you&#8217;re using an ORM like Hibernate, which claims to abstract away from the database, and allow you to use any vendor, do you think it takes differences like this into account?</p>
<p><em>Originally posted here:</em><br />
<a href="https://news.ycombinator.com/item?id=5435786">https://news.ycombinator.com/item?id=5435786</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/postgresql-vs-mysql-update-difference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Religious arguments, fear of the unknown</title>
		<link>http://www.databasesandlife.com/fear-of-the-unknown/</link>
		<comments>http://www.databasesandlife.com/fear-of-the-unknown/#comments</comments>
		<pubDate>Mon, 25 Mar 2013 10:07:15 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1530</guid>
		<description><![CDATA[A friend said this to me today: it&#8217;s really odd, why do fans of dynamic languages often have this cult-like approach? It&#8217;s an interesting thought. I know many people who love dynamic typing and reject static typing without knowing much about it. (Of course there are many people who love dynamic typing but who do [...]]]></description>
				<content:encoded><![CDATA[<p>A friend said this to me today:</p>
<blockquote><p>it&#8217;s really odd, why do fans of dynamic languages often have this cult-like approach?</p></blockquote>
<p>It&#8217;s an interesting thought. I know many people who love dynamic typing and reject static typing without knowing much about it. (Of course there are many people who love dynamic typing but who do know lots about static typing.)</p>
<p>It got me to thinking, I reckon if you only know X and know nothing about non-X, you might start to fear non-X. Your fear might express itself by reinforcing your belief that X is the best, that non-X is no good.</p>
<p>I&#8217;m sure this is a standard human emotion, but I&#8217;ve absolutely no idea what it&#8217;s called. &#8220;Fear of the unknown&#8221; perhaps.</p>
<p>Anyway, an hour later I experienced a similar emotion when reading this:<br />
<a href="http://blogs.msdn.com/b/oldnewthing/archive/2013/03/20/10403718.aspx">http://blogs.msdn.com/b/oldnewthing/archive/2013/03/20/10403718.aspx</a></p>
<p>I have absolutely no idea what this article is on about. I know so little about Windows. I always deploy on UNIX or Linux. I always have. There&#8217;s so much to know about Windows. And all that stuff that there is to know: I don&#8217;t know any of it.</p>
<p>I reckon, when being hit with a volley of emotional, religious, and cult-like arguments about why all things other than X are bad, one should take into account the possibility that the arguer doesn&#8217;t know anything other than X, and is afraid of everything which would expose that lack of knowledge.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/fear-of-the-unknown/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Generating XML programmatically? Don&#8217;t use CDATA</title>
		<link>http://www.databasesandlife.com/cdata/</link>
		<comments>http://www.databasesandlife.com/cdata/#comments</comments>
		<pubDate>Wed, 20 Mar 2013 08:12:26 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Coding]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1491</guid>
		<description><![CDATA[If you want to represent characters, you want to represent any possible sequence of characters. In an XML file, escaping &#60; with &#38;lt; gives you a way to represent any character. Using CDATA doesn&#8217;t give you a way to do that. In an XML document there are two ways to represent character data. Either You [...]]]></description>
				<content:encoded><![CDATA[<p><strong>If you want to represent characters, you want to represent any possible sequence of characters. In an XML file, escaping &lt; with &amp;lt; gives you a way to represent any character. Using CDATA doesn&#8217;t give you a way to do that.</strong></p>
<p>In an XML document there are two ways to represent character data. Either</p>
<ol>
<li><strong>You just write the characters in the XML file</strong> (in which case you need to escape characters that look like XML tags i.e. &lt; with &amp;lt; etc.), or</li>
<li><strong>You use <a href="http://www.w3schools.com/xml/xml_cdata.asp">CDATA</a></strong>, no longer having to care about replacing &lt; with &amp;lt; etc. Ends as soon as the first ]]&gt; is encountered, which is unlikely to appear in your characters.</li>
</ol>
<p>These two syntaxes result in an <strong>identical XML document</strong>. An XML parser <strong>must</strong> consider XML files identical, which differ only by if CDATA is used, or not. In both cases, there is text within a tag.</p>
<p>CDATA has a (<a href="http://stackoverflow.com/a/15455069/220627">somewhat strange</a>) syntax like:</p>
<pre>&lt;my-tag&gt;&lt;![CDATA[<strong>My characters</strong>]]&gt;&lt;/my-tag&gt;</pre>
<p>From its naming, CDATA (&#8220;Character DATA&#8221;) might seem like exactly what you need to represent character data. Combine that with the fact that characters such as &lt; don&#8217;t need to be escaped.</p>
<p>However, in fact it&#8217;s exactly the opposite of what you need. Whereas, by not using CDATA, it&#8217;s possible to escape the characters that mustn&#8217;t appear (i.e. replace &lt; with &amp;lt;) with CDATA there is <strong>no way</strong> to escape the characters that mustn&#8217;t appear (i.e. ]]&gt;).</p>
<p>It might seem &#8220;unlikely&#8221; that ]]&gt; is actually going to appear in the character data your user wishes to represent. (This is actually irrelevant, as software should work all the time, not just be &#8220;unlikely&#8221; not to work.) However, even this &#8220;unlikelihood&#8221; is misleading. No matter what sequence of characters XML had chosen to end CDATA, as soon as you represent data in itself (e.g. send an XML document in a tag), this sequence will appear in the data. So if you&#8217;re working with XML (which you are), this will happen more often than you think.</p>
<p>CDATA is just a <strong>convenience</strong> mechanism for writing XML files <strong>by hand</strong>. If you&#8217;re writing files by hand, you know which characters appear in your data, so you know whether you can use CDATA safely or not. If you&#8217;re creating a program to write XML files, you don&#8217;t know what the user&#8217;s data will contain, so you can&#8217;t use CDATA.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/cdata/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>&#8220;To programmize&#8221; (verb)</title>
		<link>http://www.databasesandlife.com/programmizing/</link>
		<comments>http://www.databasesandlife.com/programmizing/#comments</comments>
		<pubDate>Tue, 19 Mar 2013 08:11:17 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1503</guid>
		<description><![CDATA[Talking to someone who doesn&#8217;t work in IT, and who isn&#8217;t a native English speaker; She was talking about various people in software development, and referred to them doing &#8220;programmizing&#8221; (as opposed to &#8220;programming&#8221;). That&#8217;s a great concept. &#8220;Progammizing&#8221; reminds me of &#8220;terrorizing&#8221;. As in: before you have a bunch of processes involving paper and [...]]]></description>
				<content:encoded><![CDATA[<p>Talking to someone who doesn&#8217;t work in IT, and who isn&#8217;t a native English speaker; She was talking about various people in software development, and referred to them doing &#8220;programmizing&#8221; (as opposed to &#8220;programming&#8221;).</p>
<p>That&#8217;s a great concept. &#8220;Progammizing&#8221; reminds me of &#8220;terrorizing&#8221;. As in: before you have a bunch of processes involving paper and people and it all works. Then, some people come in and do some programmizing. Afterwards, nothing works, everything&#8217;s slow, crashes, has bugs, loses data, you have to turn stuff off and on a lot&#8230;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/programmizing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Map keys: Only allow strings, don&#8217;t allow &#8220;any object&#8221;</title>
		<link>http://www.databasesandlife.com/map-keys/</link>
		<comments>http://www.databasesandlife.com/map-keys/#comments</comments>
		<pubDate>Mon, 18 Mar 2013 07:57:13 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Software Design]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1493</guid>
		<description><![CDATA[Associative arrays. Maps. Hashes. Dictionaries. They are an important cornerstone of data modelling in software. Every language has them. Question: What types should the keys be? The answer depends on what language you&#8217;re using. I have spent a lot of time programming Perl and Java, and they answer this question differently. Perl&#8217;s hashes force you [...]]]></description>
				<content:encoded><![CDATA[<p><strong>Associative arrays. Maps. Hashes. Dictionaries. They are an important cornerstone of data modelling in software. Every language has them. </strong></p>
<p><strong>Question: What ty</strong><strong>pes should the keys be?</strong></p>
<p>The answer depends on what language you&#8217;re using. I have spent a lot of time programming Perl and Java, and they answer this question differently. <a href="http://perldoc.perl.org/perldata.html#Variable-names">Perl&#8217;s hashes</a> force you to have string keys. <a href="http://docs.oracle.com/javase/7/docs/api/java/util/Map.html">Java&#8217;s Map</a> allows you to have any objects as keys, and you can override (amongst others) the equals method to allow the map to know if two keys are the same or not. <a href="http://www.sgi.com/tech/stl/Map.html">C++ STL&#8217;s &#8220;map&#8221;</a> is the same as Java, but in addition allows you to supply code, when you create the map, to determine if two key objects are the same or not.</p>
<p>The Java and C++ STL way definitely seem to be more general. But after years of programming in both worlds, I&#8217;ve come to a conclusion: The way Perl does it, allowing only strings as keys, is better.</p>
<p>It&#8217;s obvious whether two strings are equal or not. It&#8217;s not obvious whether two arbitrary objects are equal or not.</p>
<p>Take a User object. This represents a user in the system. It has an id, a name, a picture and so on. You want information against users, so you create a Map with Users as keys, and the information as values. The code Map&lt;User, MyInfo&gt; reads easily enough. Adding entries and fetching entries from this map reads easily enough.</p>
<p>But what does it actually do? Under what circumstances are two User objects equal? When their IDs are equal? When all fields are equal? When certain key fields such as names are equal? What if two users haven&#8217;t been inserted into the database yet and thus don&#8217;t have IDs yet?</p>
<p>This is a whole can of worms. I have encountered subtle bugs due to object equality not being what I expect it to be in a certain situation (even though it looked reasonable enough when the class was defined). These bugs have been difficult to trace and fix, and aren&#8217;t always the sort of bug which even have a manifestation quickly or all the time.</p>
<p>Adding information to the map with map.add(user.id, myInfo) is also easy to read. You know exactly what it does. You don&#8217;t have to program those annoying equals and hashCode methods, so your business logic becomes less diluted by implementation artefacts. Those methods also can&#8217;t have bugs if you haven&#8217;t programmed them.</p>
<p>A disadvantage: If you&#8217;re using a language supporting static typing and generics, then extra readability and compiler-verification can be gained by stating the type of the key of a relationship. Just using &#8220;String&#8221; doesn&#8217;t allow this (is it a user id? the name of the user?). But I still think that the simplicity aspect described above is more important.</p>
<p>Perhaps one could argue that numbers should be allowed as keys too. But I think, for simplicity, there should only be one type of key. A number can be converted to a string easily enough (as long as one makes sure to remove any leading zeros, which would make two identical numbers appear to be different strings). More complex data, that one might want as a key, can be more easily converted to a string than to a number.</p>
<p><strong>Regarding sets and finding unique objects,</strong> the situation is similar. Perl has no facility to do this. Java has the ability to add objects to a <a href="http://docs.oracle.com/javase/7/docs/api/java/util/Set.html">Set</a>, the elements&#8217; equals method is used to determine if objects are the same or not. <a href="http://www.sgi.com/tech/stl/set.html">C++ STL&#8217;s &#8220;set&#8221;</a> is the same as Java, but also allows you to supply a function to determine if two elements are the same or not.</p>
<p>Again, if you want to find which (unique) users certain information pertains to, in Java you&#8217;d create a Set&lt;User&gt; and put the users in there. Again, what makes one User the same as another? Again, subtle bugs can be introduced when your understanding of what makes a user unique doesn&#8217;t coincide with the author&#8217;s of the user class.</p>
<p>This is the way Perl does it: There are only maps, no sets. You place objects in the map as values, and the unique aspect is the key of the map. At the end you just take all the values of the map, and ignore the key. map.add(user.id, user) is easy to write, it&#8217;s clear what&#8217;s going on.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/map-keys/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t names classes AbstractThing or ISerializable</title>
		<link>http://www.databasesandlife.com/dont-names-classes-abstractthing-or-iserializable/</link>
		<comments>http://www.databasesandlife.com/dont-names-classes-abstractthing-or-iserializable/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 08:14:35 +0000</pubDate>
		<dc:creator>adrian</dc:creator>
				<category><![CDATA[Coding]]></category>

		<guid isPermaLink="false">http://www.databasesandlife.com/?p=1475</guid>
		<description><![CDATA[Naming a class AbstractThing violates the Liskov Substitution Principle, and causes client code to read wrong. Call a class a name which you&#8217;d be happy to call any of the objects belong to the class. If you want to borrow your friend&#8217;s phone, and you&#8217;re not sure what brand it is, how many times have [...]]]></description>
				<content:encoded><![CDATA[<p><b>Naming a class AbstractThing violates the Liskov Substitution Principle, and causes client code to read wrong. Call a class a name which you&#8217;d be happy to call any of the objects belong to the class.</b></p>
<p>If you want to borrow your friend&#8217;s phone, and you&#8217;re not sure what brand it is, how many times have you asked &#8220;Excuse me mate, can I borrow your abstract phone? Cheers.&#8221;</p>
<p>Object-oriented modeling and programming allows certain classes of objects to specialize other general classes of objects. For example, both a Rectangle and Circle are special types of of Shape.</p>
<p>Every Rectangle is a Shape. Everywhere you can use a Shape, you can use a Rectangle. Code that deals with a Rectangle (but which can also deal with Circles) might look like:</p>
<pre>Shape s = .....;
s.drawTo(myGraphicsContext);</pre>
<p>Using an object (e.g. a Rectangle) anywhere you can use a generalization of it (e.g. a Shape) is an important part of the object-oriented concept, and known as the <a href="http://stackoverflow.com/questions/56860/what-is-the-liskov-substitution-principle">Liskov Substitution Principle</a>.</p>
<p>It&#8217;s also obvious: what sort of sentence or logic would make statements about shapes, but then not be applicable to rectangles?</p>
<p>If you name the generalization an AbstractShape, this principle is violated.<b> </b>A Rectangle isn&#8217;t an AbstractShape. If anything it&#8217;s a &#8220;Concrete Shape&#8221;! A rectangle isn&#8217;t abstract (in the sense of &#8220;I don&#8217;t know what type of shape this is. Could be a rectangle, could be anything else.&#8221;). Code using AbstractShape then reads wrong:</p>
<pre>AbstractShape s = new Rectangle(...);</pre>
<p>The same is true of interfaces. What do all objects which are serializable have in common? They are Serializable. Naming the interface ISerializable would lead to code like:</p>
<pre>ISerializable s = new String("hello")</pre>
<p>But s refers to a particular instance; a particular thing. There&#8217;s nothing abstract or &#8220;just an interface&#8221; about s. s is a String, which is a specialization of the more general type of thing called a Serializable. The line of code, including s&#8217;s type, must reflect what s actually is. It&#8217;s a serializable.</p>
<p>Further, what if you change an interface to an abstract class, or a concrete class to an abstract class, or vice-versa? The client code must be updated. While isn&#8217;t too much of a pain with refactoring tools, it still leads to the suspicion that irrelevant information about the tools used to construct the implementation are leaking out to client code.</p>
<p>A class of a set of objects is named to capture the common aspects of those objects. If all objects are rectangles, call their class Rectangle. Naming a superclass, abstract or not, is no different. What do all the object modeled have in common? Some are rectangles, some are circles, but they are all Shapes.</p>
<p>The <b>objects</b> being classified into classes are not AbstractShapes, BaseShapes, IShapes, or similar.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.databasesandlife.com/dont-names-classes-abstractthing-or-iserializable/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
