Sunday, 17 June 2012

How to calculate the record size of HBase?

HBase database is used to store the large amount of data. The size of each record can really make a huge difference in the total storage requirements for the solution. For example to store 1 billion records with just 1 extra byte needs ~1 GB extra disk space.

The following section covers the details of the record in HBase. The record consist of the Row ID, Column Family Name, Column Qualifier, Timestamp and the Value.

As HBase is the column oriented database, it stores each value with fully qualified row key. For example to store the employee data with employeeId, firstName and lastName, the fully qualified row key is repeated for each column. So what is the total disk space required to store this data?

First, the following shows the scan of the employee table with actual data:
hbase(main):011:0> scan 'employee'
ROW                     COLUMN+CELL
 row1                   column=data:employeeId, timestamp=1339960912955, value=123
 row1                   column=data:firstName, timestamp=1339960810908, value=Joe
 row1                   column=data:lastName, timestamp=1339960868771, value=Robert

HBase KeyValue format

HBase stores the data in KeyValue format. The following picture shows the details including the data types and the size required by each field:

Fig 1 : Key Value Format of HBase

So to calculate the record size:
Fixed part needed by KeyValue format = Key Length + Value Length + Row Length + CF Length + Timestamp + Key Value = ( 4 + 4 + 2 + 1 + 8 + 1) = 20 Bytes

Variable part needed by KeyValue format = Row + Column Family + Column Qualifier + Value


Total bytes required = Fixed part + Variable part

So for the above example let's calculate the record size:
First Column = 20 + (4 + 4 + 10 + 3) = 41 Bytes
Second Column = 20 + (4 + 4 + 9 + 3) = 40 Bytes
Third Column = 20 + (4 + 4 + 8 + 6) = 42 Bytes

Total Size for the row1 in above example = 123 Bytes


To Store 1 billion such records the space required = 123 * 1 billion = ~ 123 GB

Please see the following snapshot of KeyValue.java for details of the fields type and size in HBase KeyValue:

  static byte [] createByteArray(final byte [] row, final int roffset,
      final int rlength, final byte [] family, final int foffset, int flength,
      final byte [] qualifier, final int qoffset, int qlength,
      final long timestamp, final Type type,
      final byte [] value, final int voffset, int vlength) {
      ..
      ..
// Allocate right-sized byte array.
    byte [] bytes = new byte[KEYVALUE_INFRASTRUCTURE_SIZE + keylength + vlength];
    // Write key, value and key row length.
    int pos = 0;
    pos = Bytes.putInt(bytes, pos, keylength);
    pos = Bytes.putInt(bytes, pos, vlength);
    pos = Bytes.putShort(bytes, pos, (short)(rlength & 0x0000ffff));
    pos = Bytes.putBytes(bytes, pos, row, roffset, rlength);
    pos = Bytes.putByte(bytes, pos, (byte)(flength & 0x0000ff));
    if(flength != 0) {
      pos = Bytes.putBytes(bytes, pos, family, foffset, flength);
    }
    if(qlength != 0) {
      pos = Bytes.putBytes(bytes, pos, qualifier, qoffset, qlength);
    }
    pos = Bytes.putLong(bytes, pos, timestamp);
    pos = Bytes.putByte(bytes, pos, type.getCode());
    if (value != null && value.length > 0) {
      pos = Bytes.putBytes(bytes, pos, value, voffset, vlength);
    }
    return bytes;
}

1 comment:

  1. SQIAR (http://www.sqiar.com/solutions... is a leading Business Intelligence company and provides Tableau Software consultancy in United Kingdom and USA.

    ReplyDelete