Saturday, November 13, 2010

The copyright part of Oracle vs. Google

In my previous reporting on Oracle vs. Google, I have naturally focused on the patent infringement claims brought forward by Oracle against Dalvik, the virtual machine for Google's Android operating system. After an initial reaction, I looked at licensing issues, the Open Invention Network's failure, Google's stance on software patents, international equivalents of the patents-in-suit, the FSF's belated and misleading statement, and possible inaccuracies in Google's description of the history of the Java standard.

However, in addition to allegations that Google violates seven Java patents, Oracle also asserts copyright infringement. Patents and copyrights have a lot in common -- they are both intellectual property rights and both relevant to software -- but there are also significant differences in terms of the scope of protection.

Let's take a closer look now at the copyright part of the lawsuit from the procedural as well as the substantive angle.

Oracle provided some more clarity in response to a Google motion to dismiss the copyright infringement claim

If a company sues another over the infringement of intellectual property rights, it has to be somewhat specific in its complaint but there's a limit to the level of detail that is required at that stage. Most of this is left to the fact-finding stage of the proceeding.

I don't want to get (at least not now) into the debate whether some more specificity should be required. Gene Quinn, the author of the IPWatchdog blog, expressed a critical view of the status quo in this post. Note the final paragraph. In that one, he admits that he can't blame any particular plaintiff for playing by the rules:

"If you can initiate a lawsuit without any information to actually inform the defendants as to what they are doing that might be infringing then you might as well. If the Courts will allow you to file a complaint that says I think you are infringing but I’m not about to tell you how or why I think that; go figure it out yourself, then you might as well. [...] There is an enormous strategic advantage to keeping the defendants guessing and in the dark for as long as possible."

Basically, that's what Oracle tried as well. In its original complaint against Google, filed on 12 August 2010, Oracle listed seven counts of patent infringement (one patent per count) and added an eighth count on copyright infringement. That one, however, didn't specify the infringed rights and the infringing material too clearl. In item 38, Oracle claimed that Java is protected by copyright. In item 39, Oracle then claimed that "Google's Android infringes Oracle America's copyrights in Java and Google is not licensed to do so." In item 40, Oracle stated that "users of Android, including device manufacturers, must obtain and use copyrightable portions of the Java platform or works derived therefrom to manufacture and use functioning Android devices. Such use is not licensed. [...]"

That claim just specified that Android contains software that infringes Oracle's (originally, Sun's) copyright in Java. But Android and Java are two large pieces of software, and on 4 October 2010, Google filed (in parallel to its answer to the complaint) a motion to dismiss the count on copyright infringement "or, in the alternative, for a more definite statement". Among other things, Google wrote:

"Oracle's Complaint includes impermissibly vague and broad allegations of copyright infringement. In particular, the Complaint does not specifically identify any allegedly infringing works of Google, how Google has allegedly infringed Oracle’s rights in the two Sun works attached to the Complaint, or how Oracle believes its claim of vicarious liability for copyright infringement arises."

The short version of the above is "show me, I'm from Missouri."

Oracle didn't want to risk that its copyright infringement claim might be dismissed. On 27 October 2010 it filed an amended complaint with a significantly more specific Count VIII. To document an example of the alleged copyright infringement, Oracle attached a new Exhibit J juxtaposing code excerpts from Oracle/Sun's Java codebase to excerpts from a part of the Android codebase. Oracle made that filing one day before a deadline by which it would have had to reply to Google's motion.

The court formally asked Google (on 28 October 2010) to clarify whether (and if so, on what basis) it still believed Oracle's copyright infringement claim should be dismissed. On 1 November 2010, Google signaled that it could live with the amended complaint, which resulted -- still on the same day -- in the court's denial as moot of Google's motion to dismiss Count VIII. There was no more need to decide whether Google's motion was meritorious when filed: it had just become pointless in the meantime.

On 10 November 2010, Google submitted its answer to Oracle's amended complaint. In that one, Google denies copyright infringement and brings up every conceivable defense (without providing any detail that would support those defenses).

Oracle probably has a point about copyright infringement (at least to a certain degree)

At this stage it would be premature for me to agree with Oracle on its copyright infringement claim. While Oracle's amended complaint provides some more information, it still doesn't specify everything that one needs to know. As you saw further above in my quote from Gene Quinn, it's in a plaintiff's interest to keep their cards close to their chest for as long as possible.

But based on the information that has already been provided, I would be very surprised if Oracle's copyright infringement assertion was dismissed entirely.

Maybe Oracle will find it difficult to prove the full extent of its allegations (including that Google acted "knowingly" and "willingly"). Some of that may depend on fact-finding. But even in the event that Google is found less culpable than Oracle claims, there might still be a copyright infringement, which would entitle Oracle to all sorts of relief that would be devastating for Google and Android at any rate.

There's an important thing to bear in mind here: a copyright infringement (if there is one) is illegal and must be stopped. Simple as that.

"Others did it" doesn't matter too much -- and it particularly doesn't help app developers

Google's Sixteenth Defense (item 24 of its answer to the amended complaint) states the following:

"Any use in the Android Platform of any protected elements of the works that are the subject of the Asserted Copyrights was made by third parties without the knowledge of Google, and Google is not liable for such use."

However, claims that "if anything was wrong, others did it" (which is the short version of the above) don't mean Google isn't liable in any way. We're talking about a potential difference that would only be gradual in terms of some of the consequences for Google, such as the amounts of possible damage awards. But from the perspective of an Android developer, a defense like "others did it" doesn't make any difference at all because if there is an infringement, Oracle can stop Google (and anyone else!) from continuing to make such software available, with all it entails for the Android application ecosystem.

What I just explained is something that the other reports I saw on this didn't really point out. As an app developer, the last thing I would worry about is whether Google has to pay $50 million, $500 million or $5 billion. I would only be concerned about whether my application will still have a legally safe basis on which to run, or whether I would have to make modifications or even undertake a complete rewrite (possibly in a different programming language) when all of this is over.

In item 40 of its amended complaint, Oracle states (among other things) the following:

"Android includes infringing class libraries and documentation. Approximately one third of Android’s Application Programmer Interface (API) packages (available at http://developer.android.com/reference/packages.html) are derivative of Oracle America's copyrighted Java API packages (available at http://download-llnw.oracle.com/javase/1.5.0/-docs/api/ and http://download-llnw.oracle.com/javase/1.4.2/docs/api/) and corresponding documents."

The website on which the allegedly infringing material has been made available, android.com, is a Google website. In item 12 of its answer to the amended complaint, Google points to the "Open Handset Alliance, a group of 78 technology and mobile companies that includes Google". So what? Oracle doesn't appear to hold Google responsible for the content of the Open Handset Alliance website. If Oracle wanted, it could make the same claims against other members of the Android ecosystem, but for whatever reasons (such as existing business relationships) Oracle decided to focus on Google, at least for now. And android.com, to which Oracle's allegations relate, is a Google website, which is also made perfectly clear by the site's terms of service. Google isn't allowed to distribute infringing material.

Code similarities indicate that some infringement has indeed occurred

I have taken a detailed look at the code excerpts provided by Oracle in Exhibit J of its amended complaint. If I had written the original Java code shown in the left column of that juxtaposition and if I then saw the Android code in the right column, I, too, would claim that my copyright has been violated.

It's important to know that Oracle provided that exhibit only as an example. Item 40 of Oracle's amended complaint claims that "[i]n at least several instances, Android computer program code also was directly copied from copyrighted Oracle America code. [...] not just in name, but in the source code on a line-for-line basis."

Some commentators have overstated the importance of Google's denial of the correctness of that piece of evidence, which is found in item 40 of Google's answer to the amended complaint:

"[...] Google further denies that the document attached to Oracle's Amended Complaint as Exhibit J contains a true and correct copy of a class file from either Android or “Oracle America's Java.” Google states further that Oracle has redacted or deleted from the materials shown in Exhibit J both expressive material and copyright headers that appear in the actual materials, which are significant elements and features of the files in question."

There's nothing spectacular about that. It doesn't mean that Google believes Oracle made something up entirely. Actually, Oracle's Exhibit J makes it perfectly clear that it isn't an unmodified copy of the relevant code segments. The headline of the left column states "[comments removed and spacing adjusted for comparison]", and the headline of the right column: "[spacing adjusted for comparison]". So by denying that this is a "true and correct copy", Google simply keeps the option open to claim later that whatever Oracle redacted was so very relevant it shouldn't have been left out. However, if Google could prove that Oracle forged a document, I'm sure it would already have said so. Realistically, Oracle is a tough company but it certainly doesn't engage in any criminal activity. So let's not overstate Google's unspecified denial.

I can't imagine that the original versions of those files would render Oracle's claim invalid. The part that Oracle provides shows what really appears to be an infringement.

This is getting very technical now, but let me explain what Exhibit J shows in terms of copying of code.

The interface

The PolicyNodeImpl.java files implement (the "Impl" in the file name stands for "implementation") the PolicyNode interface, which is part of the java.security.cert library. That interface describes the structure of a tree node (an item in a tree-structured collection of data) "as defined by the PKIX certification path validation algorithm." PKIX appears to be a public-key infrastructure standard defined by the Internet Engineering Task Force (IETF).

That interface is a concise rather than talkative one. It mandates seven methods that basically appear to be read-only properties: getChildren (returns child nodes in the form of a thread-safe iterator), getDepth (returns an integer value of the depth of the given node within the tree), getExpectedPolicies (returns a set of strings describing "expected" policies), getParent (returns the parent node as an instance of the same class or null for the root node), getPolicyQualifiers (a set of instances of the PolicyQualifierInfo class), getValidPolicy (returns a string describing the valid policy represented by the given node), and isCritical (returns a Boolean value as a "criticality indicator").

It's obvious that two programmers independently implementing such a limited interface may use a number of coding patterns that are similar. However, there comes a point when similarities between two implementations are so far-reaching that it would be easier to win a lottery every week for several years in a row than to come up just by coincidence with such similar code segments through totally independent creation. In other words, at some point it becomes a safe assumption that a copycat was at work, somewhere. Maybe not at Google itself. Maybe not at the Apache Foundation (which has already disowned the code segment in question). But someone somewhere apparently acted as a copycat and somehow tried to cover up what happened but didn't really do a great job at that. The copying that has taken place is, at best, thinly veiled.

Initial declarations and constructor

The declarations at the start of the class definition are identical in both files, including the variable names. Given that those are private variables (as opposed to being part of the public interface definition), they could have been chosen freely, and then it's very unlikely that both files would contain identical variable names in that part of the code.

The constructors obviously have structurally identical parameter lists. However, the Android version of the code replaces all of the parameter names. Such replacements of variable names occur throughout the code shown by Oracle. In the vast majority of those cases, the original Java code uses more desciptive and meaningful names, whereas the variable names were apparently replaced with less informative ones for the Android version.

For example, the two sets named "qualifierSet" and "expectedPolicySet" in Oracle/Sun's code became "set" and "set1" in the parameter list of the Android version. Given that those are sets of different data types and that professional programmers generally avoid variable names with numbers in them, I have a strong suspicion that this is an approach to naming that makes sense only for the purpose of concealing a copyright infringement.

Hypothetically, someone might have just had a preference for shorter names, but a programmer always spends only a limited amount of time typing and a lot of time trying to figure out what the code does, which is why descriptive variable names are the norm and the naming approach of that Android code is, to say the least, highly unusual.

The code of the constructor then processes the parameters in the order in which they appear in the parameter list. In terms of the actual commands, both constructors do exactly the same. The only other difference (besides the slightly different variable names) is that the Android version makes more extensive use of braces in places where they aren't syntactically required. Apparently, the original version of the code uses braces in if/else constructs only if one of the blocks of code has more than one line, while the Android version of the code uses braces at any rate.

Besides the main constructor, there's also a shorter version. That one only has two parameters and then calls the main constructor. Both implementations use different variable names but they have the same logic. The Android version contains some type conversion that appear redundant to me. Again, I don't know who did that and I don't know why, but introducing unnecessary type conversions (converting data into the type in which it's already provided) could be another way of trying to make things harder for "diff" (line-by-line text file comparer) tools.

Patterns in the choice of variable names and the loop structures

Throughout the code (except for the declarations at the very start of the class definition) one can see the difference in variable naming conventions that I explained for the constructors. There are many variable names that are identical in both code excerpts, but if there are differences, they are usually of the kind I described and which doesn't smell right in the Android version.

The Oracle version of the code has a logical priority in terms of when to choose longer variable names (when it's needed to be descriptive) and when to use shorter ones (such as for loop variables). In the Android version it just doesn't make sense that short names are used where descriptiveness is needed, but a loop variable is called "iterator" when loop variables are usually just one or two characters long (Oracle's corresponding code uses "it" for an interator that serves as a loop variable). It strikes me that the Android version, presumably in an attempt to just be different from Oracle's code, runs completely counter to the naming conventions and principles of competent programmers.

Another consistent morphological pattern is that the Android version uses different loop structures (for loops that have the same body in terms of what they do). Some examples:

  • The "setImmutable" method of the "isImmutable" Boolean property uses a while loop for the iteration, while the Android version uses a for loop. Again, the Oracle version does what makes logical sense in terms of source code readability. To me that loop is a clear case of a while loop. A for loop would make more sense if you have a loop variable that gets incremented and where you usually exit at a predefined point.

  • The "prune" method in Oracle's code uses a while loop, while the Android version uses a do...while(true) construct that I find, frankly, quite horrible and again implausible. It's longer and less readable. It also makes use of the break command to exit the loop if the iterator has reached its end. I admit that I personally like that kind of command as well even though some purists among computer scientists are categorically opposed to it. However, this loop isn't an example of where that makes sense. Oracle's while loop checks right at the start whether the iterator still has items available that can be processed. That way, there's no need to break out of the loop. Also, a while(true) at the end of a loop is unusual. By doing so, the only way one can get out of the loop is with a break command. I could imagine such a structure in some scenarios (in which there would be at least two and usually more break commands), but this isn't one in which it would make sense (other than for the sole purpose of disguising the unauthorized use of Oracle's code).

Another oddity in the Android implementation of that interface is found in the "deleteChild" method. Unlike the Oracle version, the Android version has a return command at the end of the else block. However, there's no other code that would be executed, so the return command is just redundant.

What also strikes me as highly unusual is the use of i as a parameter name for the "getPolicyNodes" method. In Oracle's version, that integer parameter is called "depth", which makes sense and its descriptive. Usually, programmers reserve i exclusively for the use of a loop variable name.

The "getPolicyNodesExpected" method in Oracle's version performans a comparison with an ID defined at the start of the source file in the ANY_POLICY static variable. Even though the Android version also defines that static variable and assigns the same name to it, its "getPolicyNodesExpected" implementation contains a literal ("2.5.29.32.0") although it normally should and would use the static never-changing variable. The same pattern is found in the implementations of the "policyToString" member. Again, the most plausible explanation is that someone violated clean code conventions only in order to create an artificial difference between his code and Oracle's original code.

Finally, the "asString" function: both implementations use a stringbuilder, which is a more performant way of composing strings than string variables if a number of strings are concatenated. Since almost every line in that function starts with a reference to that stringbuilder instance, this is similar to a loop variable and one would, as in Oracle's code, choose a short variable name ("sb" in that case). But the Android version uses the long name "stringbuilder". This is similarly anti-conventional as "iterator" for a loop variable.

My assessment of the situation based on the information available at this point

While Oracle's Exhibit J contains only a somewhat limited amount of code, I have a really strong feeling that Oracle's copyright has been infringed and that someone made a rather pathetic attempt to veil that infringement by using the reverse approach to naming conventions as in the original code (with the net effect that variable names in the Android version are descriptive when every reasonable programmer would keep them short and are short when every reasonable programmer would want them to be descriptive), an excessive use of braces, and loop structures that are partially so bad that someone couldn't possible pass a computer science exam with such an abysmal coding style.

Even though the interface those code excerpts implement is of limited complexity, an independent creation of the Android version of the code that innocently happened to arrive at code so similar to Oracle's version isn't plausible to me at all. The differences are superficial, and where the Android version is different at all, it's different in a way that flies in the face of everything I know about coding. As a software developer and a former computer book author, I hope I know a thing or two about that...

I will take a look at the copyright aspects of Oracle vs. Google again whenever more information becomes available. For now, if I had to bet money, I would without hesitation bet it on Oracle's claim that what we see here is indeed a copyright infringement. Who wrote the apparently infringing code is another question, but like I explained further above, from the perspective of Android application developers there's absolutely no comfort in that because the potential effects on developers are independent from the answer to that secondary question.

If you'd like to be updated on patent issues affecting free software and open source, please subscribe to my RSS feed (in the right-hand column) and/or follow me on Twitter @FOSSpatents.