Thursday, December 17, 2009

Faster reading UTF-8 encoded file in Android

I created an Android application which reads some text files from a raw resource. The text files are encoded in UTF8. Therefore, I straight away wrote the code to convert bytes from the file into characters.

InputStreamReader in = new InputStreamReader(new BufferedInputStream(resources.openRawResource(R.raw.textfile)))
int c = in.read(); // read a character, and so on.

But, reading a 10KB file takes almost a minute on the Android 1.5 emulator! I wondered what made that so slow, in my Nokia phone, the same program written in Java ME takes less than a second to do the same thing.

By using Traceview, I found out that most of the time is spent on the UTF-8 decoding from bytes to characters. Android's Java implementation uses IBM ICU for character encoding. And it seems to be overkill to just decode UTF-8. Hence, the solution is to create own implementation if UTF-8 decoder. (Some concept taken from Go source, less the error-checking overhead and only look for max 16-bit characters.)

public class Utf8Reader implements Closeable {
    private InputStream in_;
    public static final char replacementChar = 0xFFFD;

    public Utf8Reader(InputStream in) {
        in_ = in;
    }

    public int read() throws IOException {
        int c0 = in_.read();

        if (c0 == -1) {
            // EOF
            return -1;
        }

        if (c0 < 0x80) {
            // input 1 byte, output 7 bit
            return c0;
        }

        int c1 = in_.read();

        if (c1 == -1) {
            // partial EOF
            return -1;
        }

        if (c0 < 0xe0) {
            // input 2 byte, output 5+6 = 11 bit
            return ((c0 & 0x1f) << 6) | (c1 & 0x3f);
        }

        int c2 = in_.read();

        if (c2 == -1) {
            // partial EOF
            return -1;
        }

        // input 3 byte, output 4+6+6 = 16 bit
        return ((c0 & 0x0f) << 12) | ((c1 & 0x3f) << 6) | (c2 & 0x3f);
    }

    @Override
    public void close() throws IOException {
        in_.close();
    }
}

(Please add the required import by yourself.) The result is satisfying: the 10KB file is now loaded in about 1 second in the emulator, and almost instantly on the device.

Monday, November 9, 2009

Base64 in PHP and Python

Today I calculated a hash value based on strings encoded with Base64 encoding.

One in PHP, and one in Python. Both of them should return the same value, because the hashes were compared for verification.

So, in PHP, the code is

myHashFunction(base64_encode('original string')) 

And in Python, the code is

myHashFunction(base64.encodestring('original string'))

Dangerous! The results are different! Since the 'original string' was not as simple as that, I thought I had passed the wrong data. But after some checking, the results of base64_encode and base64.encodestring were different.

base64_encode('original string') returns "b3JpZ2luYWwgc3RyaW5n"

whereas base64.encodestring('original string') returns "b3JpZ2luYWwgc3RyaW5n\n"

More precisely, base64.encodestring added new-line character at the end (and every 76 chars I think), suitable for email attachment, whereas base64_encode does not.

An easy solution to make them identical is to add replace function to the Python version, to become: base64.encodestring('original string').replace('\n', '').

Monday, October 12, 2009

find -exec equivalent for Windows cmd

I was looking for replacement of the shell (bash) command:
find -name '.svn' -exec rm -rf {} \;

for Windows cmd.exe.

The purpose is to remove all .svn directories from a directory recursively.

In cmd you can do dir /b /s to list directories in plain format including its subdirectories. For example:

C:\WINDOWS\system32\config>dir /s /b
C:\WINDOWS\system32\config\AppEvent.Evt
C:\WINDOWS\system32\config\system.sav
C:\WINDOWS\system32\config\systemprofile
C:\WINDOWS\system32\config\userdiff
C:\WINDOWS\system32\config\systemprofile\Desktop
C:\WINDOWS\system32\config\systemprofile\Favorites
C:\WINDOWS\system32\config\systemprofile\My Documents
C:\WINDOWS\system32\config\systemprofile\Start Menu

...

That format is already similar to what find command does. So how to execute a program with arguments taken from this list?

The answer is, use the FOR command with /F "usebackq" switch.

So, we put the command in backquotes, like this:

for /F "usebackq" %i in (`dir /s /b *.svn`) do rmdir /s /q %i

Problem solved. Remember to double the percent sign if you do this in a batch file.

Friday, September 25, 2009

How to copy error message or anything from message box

If you get an error message, warning, status information in a message dialog like this:




How can you copy the error message and inform more expert people?

The answer is simple: Press Ctrl + C when the message box appears. Although you can't select the text with your mouse, just press Ctrl + C and you will get the title, message, and buttons copied to your clipboard.

Sample output:


---------------------------
notped
---------------------------
Windows cannot find 'notped'. Make sure you typed the name correctly, and then try again. To search for a file, click the Start button, and then click Search.
---------------------------
OK   
---------------------------

I have been using Windows XP since 2003 and this I just knew a month ago! What a shame.

Tuesday, September 15, 2009

Extract required parameters to variables with a single line of code! (PHP)

[Note: this is for those not using sophisticated PHP frameworks (ZF, CakePHP, etc.), but using plain old PHP (Smarty is still OK and great!)]

How many times have you wanted to extract selected GET or POST or COOKIE variables into local/global variables to be used easily? Something like:

$submit = (int) $_REQUEST['submit'];
$name = $_REQUEST['name'];
$email = $_REQUEST['email'];
$preference = (int) $_REQUEST['preference'];

if ($submit) {
    if (!$name and !$email) {
        $error = 'Please fill in your name and email';
    } else {
        sql("insert into POST values (?, ?, ?)", $name, $email, $preference);
    }
}

And then at another page, you need to do the same thing over and over again (and you don't actually care whether it's GET or POST or...)

$start = (int) $_GET['start'];
$length = (int) $_GET['length'];
$locale = $_GET['locale'];

Solution: I have always been using this function to extract request variables to the global scope:

// first example
getVars('submit name email preference');
echo "Your name is $name"; // the variable $name is now available
// second example
getVars('start length locale');

Isn't it convenient?

How does getVars() function look like?

function getVars($vars) {
    if (is_string($vars)) {
        $vars = split(' ', $vars);
    }
    foreach ($vars as $var) {
        $GLOBALS[$var] = $_REQUEST[$var];
    }
}

The trick is to move the variables from $_REQUEST to $GLOBALS.
Now let's enhance it so that it converts the variables to int when needed:

function getVars($vars) {
    if (is_string($vars)) {
        $vars = split(' ', $vars);
    }
    foreach ($vars as $var) {
        $modifier = '';
        if (strpos($var, '/') !== false) {
            $tp = split('/', $var);
            $var = $tp[0];
            $modifier = $tp[1];
        }
        $value = $_REQUEST[$var];
        if ($modifier == 'hex') {
            $value = pack("H*", $value);
        } elseif ($modifier == 'int') {
            $value = (int)$value;
        }
        $GLOBALS[$var] = $value;
    }
}

We go back to the examples, we now have:

// first example
getVars('submit/int name email preference/int');
// second example
getVars('start/int length/int locale');

Such a useful function, Let us use it!

Tip: you can also write '/hex' to convert "414243" to "ABC"

Friday, August 21, 2009

Escaping database column name when using Hibernate

Changing the database engine when using Hibernate turned out to be not as easy as changing the .hbm.xml file.
At first I used Apache Derby (client-server) as the database engine, just for testing purposes. I tried to change it to Microsoft SQL Server, by using the JDBC driver provided and changing the .hbm.xml file. A problem happened, a String field whose length I set to 40000, cannot be handled, so I need to truncate it to 8000. Later when I change it to Oracle, there cannot be 2 columns that have LONG VARCHAR2 type in a single table, so I need to further reduce it to 4000 characters.
The problem does not stop there. I use Hibernate Annotations, so every field is configured in the source code (instead of in the .hbm.xml file). I can define the column names if I want:
@Entity
public class Entry {
  String message;

  @Column(name = "log_time")
  Date time;
}
In that case, the message field will be stored in a database column called message, but the time field will be stored in a database column called log_time since I have defined a @Column annotation.
The sad truth is that Hibernate does not escape column names mentioned in queries, so some DBMS'es regard them as keywords. Example include exception in Oracle, size in Derby, and so on.
We can force Hibernate to escape identifier names on the SQL queries by putting backticks (`) on the column name. The backticks will be converted to different identifier escaper for different database systems. For example, `column` in MySQL, [column] in Microsoft SQL, and "column" in Apache Derby.
So, our entity class becomes something like:
@Entity
public class LogEntry {
  @Column(name = "`level`")
  String level;

  @Column(name = "`exception`", length = 4000)
  String exception;

  // etc.
}
Unfortunately, we must duplicate the name (one on the field, and one on the annotations), so it's not so easy to maintain (e.g. we do a rename-refactoring to the field name, we may not notice that the annotation's column name is not changed together).
I still wonder why doesn't Hibernate always escape the column name.

Tuesday, August 11, 2009

Excluding specific folder from directory crawlers

Shortly: I want to have a directory D:\bekap\mybackup that cannot be traversed.
I made a backup program that backs up data from a specified directory to another directory. (The program has much more features such as detecting duplicate data and to have efficient incremental backup.) I wanted to backup the whole D: drive, but that is the only partition available for the backup destination. Therefore I can only backup D: to let's say D:\bekap\mybackup.
The problem is, when the program traverses or walks through D: to find available files, in the end it will also go to D:\bekap\mybackup and it would try to back up files inside D:\bekap\mybackup to itself. (It's a similar problem with tar -cf archive.tar .).
So I want to make D:\bekap\mybackup not detectable except by directly accessing it through the path name. It is similar as if you have a non-linked web page at http://example.com/private_photos/xyz/; only people you give the address will be able to access it.
After trying through several options, I found a way to do that:
  • Go to the Security properties of D:\bekap (in Explorer: right click, properties, you may need to disable the Use simple file sharing in Folder Options)
  • Click Advanced button, then you will see entries similar to these:
  • Click Add, fill in your username, then click Check Names, then OK.
  • Tick the Deny column for the entry List Folder / Read Data.
Now you cannot access D:\bekap, but if you type D:\bekap\mybackup, you can open the contents!
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

D:\>cd bekap

D:\bekap>dir
Volume in drive D is Dadu
Volume Serial Number is 1293-1777

Directory of D:\bekap

File Not Found

D:\bekap>cd mybackup

D:\bekap\mybackup>dir
Volume in drive D is Dadu
Volume Serial Number is 1293-1777

Directory of D:\bekap\mybackup

2009-08-11  15:22    <DIR>          .
2009-08-11  15:22    <DIR>          ..
2009-08-11  15:22                 0 .bekapkeren2
2009-08-11  15:22        35,616,973 .cache.ser
2009-08-07  17:43    <DIR>          data
2009-08-11  15:21    <DIR>          fs
          2 File(s)     35,616,973 bytes
          4 Dir(s)  82,628,956,160 bytes free

Thursday, August 6, 2009

Multiple Return Values in Java

How can we have multiple return values in Java? We cannot formally have it, since each method can only have no return value (void) or single return value (int, long, Object, String, etc.)

One of the solution to emulate it is to create a class that contains two fields, one for each return value. But that will make you tired because you need to create it every time you need it (not including the time used to think what the class name should be).

In my case, I solved this problem by using a Pointer class. It works similar to C in which you can have multiple return values by passing a pointer to a value that will be modified to the function arguments like: size = fread(pbuf, size, count, pfile);

The pointer class is as follows:

public class Pointer<T> {
  public T value;
  public static <T> Pointer<T> create() {
    return new Pointer<T>();
  }
}

Let's say you have a method that needs to return two values: result and error code. The method for our example will be:

byte[] downloadFile(String url, Pointer<Integer> errorCode) {
  ...(really download)...
  if (errorCode != null) {
    errorCode.value = 200; // example only
  }
}

To use the method, we first create the pointer to store the error code as follows:

Pointer<Integer> errorCode = Pointer.create();
byte[] file = downloadFile("http://biginteger.blogspot.com/", errorCode);
System.out.printf("File downloaded (%d bytes) with error code %d", file.length, errorCode.value);

The reason of the create() is to eliminate repeated typing of the type parameter:

Pointer<Integer> errorCode = new Pointer<Integer>();

Wednesday, August 5, 2009

Setting GWT locale using cookies

GWT represents locale as a client property whose value can be set either using a meta tag embedded in the host page or in the query string of the host page's URL.
I wanted to set the locale of my GWT application stored somewhere, so that I don't need to always append ?locale=ja or &locale=ja to the URL for every page on the site. Putting the meta tag every time the page is requested is also not an option, since you still need to let the server know the preferences of the user (session management).
For example, if I have http://www.ngambek.com/ page in Indonesian, but I want to set it to Japanese, I need to set the address to http://www.ngambek.com/?locale=ja to let the GWT system use the Japanese version of the page. Later when there is a link to http://www.ngambek.com/12345678, I would need to make it http://www.ngambek.com/12345678?locale=ja. What a burden.
I decided to use cookie to store the preferred locale, then let a javascript snippet run to set a meta tag dynamically. When the GWT module is loaded, the locale will be detected because the appropriate meta tag will have been written (in browser's memory).
So, before the GWT module script is loaded, I put these lines onto the HTML file:
var getCookie = function(c_name) {
  if (document.cookie.length > 0) {
    c_start = document.cookie.indexOf(c_name + "=");
    if (c_start != -1) {
      c_start = c_start + c_name.length+1;
      c_end = document.cookie.indexOf(";", c_start);
      if (c_end == -1) c_end = document.cookie.length;
      return unescape(document.cookie.substring(c_start,c_end));
    }
  }
  return "";
}

var locale = getCookie("locale");
if (! locale) {
  locale = "id";
}

document.write("<meta name='gwt:property' content='locale=" + locale + "' />");
Then, in order to set the locale, it's just a matter of setting the locale cookie and reloading the page:
function setLocale(locale) {
  document.cookie = "locale=" + escape(locale);
  document.location.href = document.location.href;
}
By the way, the internationalization feature of GWT is great! Try it if you haven't tried it before.