zVarchar example

As mentioned in a previous entry, I decided to try to build a compressed text data type as an example of a UDT. So here it is. I built a datatype called zVarchar. To keep things as a reasonable demo, I didn’t incorporate complex compression, just a simple run length encoding scheme and built it in VB so most should be able to read it. But it’s interesting. If you want to try it, start a new SQL Server project in VS2005, add a User Defined Type and include the code below. After building and deploying it, you can do the following:


CREATE TABLE GregTest(


    RecID int IDENTITY(1,1),


    TextValue zVarchar


)


I then tried storing values in it. On small varchar values, the overhead of the UDT meant the data was actually larger than a varchar. But, I tried some examples with large varchar values (around 1000 characters per row and about 40% compressible) and got good results. My table using the zVarchar data type was around 50% of the size of a table using varchar, with 10,000 rows in the table. To keep it interesting, I added some additional functions to allow access to the compressed value, the length (both compressed and uncompressed), etc.


You can then execute queries like:


SELECT TextValue.CompressedLength() FROM GregTest


SELECT AVG(TextValue.CompressionPercentage()) FROM GregTest


Enjoy!


Imports System
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports System.Text


<Serializable()> _
<SqlUserDefinedType(Format.SerializedDataWithMetadata, MaxByteSize:=8000)> _
Public Class zVarchar
    Implements INullable


    Private ValueIsNull As Boolean = True
    Private CurrrentValue As String = “”


    Public Overrides Function ToString() As String
        If Me.ValueIsNull Then
            Return “NULL”
        Else
            Return Decompress(Me.CurrrentValue)
        End If
    End Function


    Public ReadOnly Property IsNull() As Boolean Implements INullable.IsNull
        Get
            Return Me.ValueIsNull
        End Get
    End Property


    Public Function CompressedLength() As Integer
        If Me.ValueIsNull Then
            Return 0
        Else
            Return Me.CurrrentValue.Length
        End If
    End Function


    Public Function CompressedValue() As String
        If Me.ValueIsNull Then
            Return “NULL”
        Else
            Return Me.CurrrentValue
        End If
    End Function


    Public Function DeCompressedLength() As Integer
        If Me.ValueIsNull Then
            Return 0
        Else
            Return zVarchar.Decompress(Me.CurrrentValue).Length
        End If
    End Function


    Public Function CompressionPercentage() As Decimal
        If Me.ValueIsNull Then
            Return 0
        Else
            Dim compressedLength As Integer
            Dim decompressedLength As Integer


            compressedLength = Me.CurrrentValue.Length
            decompressedLength = Me.DeCompressedLength
            If decompressedLength = 0 Then
                Return 0
            Else
                Return Decimal.Round(CType(compressedLength, Decimal) * 100 / CType(decompressedLength, Decimal), 2)
            End If
        End If
    End Function


    Public Shared ReadOnly Property Null() As zVarchar
        Get
            Dim newNullValue As zVarchar = New zVarchar


            Return newNullValue
        End Get
    End Property


    Public Const EscapeCharacter As Char = “\”c


    Public Const MaximumRepeatValue As Integer = 250


    Public Shared Function Parse(ByVal s As SqlString) As zVarchar
        If s.IsNull Or s.Value.ToLower().Equals(“null”) Then
            Dim newNullValue As New zVarchar


            Return newNullValue
        Else
            Dim zValue As New StringBuilder
            Dim objectToReturn As New zVarchar


            objectToReturn.ValueIsNull = False
            objectToReturn.CurrrentValue = zVarchar.Compress(s.ToString())


            Return objectToReturn
        End If
    End Function


    Public Shared Function Compress(ByVal ValueToCompress As String) As String
        Dim compressedValue As New StringBuilder
        Dim lastValue As Char
        Dim valueCount As Integer = 0
        Dim currentValue As Char
        Dim characterCounter As Integer


        For characterCounter = 0 To (ValueToCompress.Length – 1)
            currentValue = CType(ValueToCompress.Substring(characterCounter, 1), Char)
            If currentValue = zVarchar.EscapeCharacter Then
                If valueCount > 0 Then ‘ we have something to output first
                    If valueCount < 3 Then
                        compressedValue.Append(lastValue, valueCount)
                    Else
                        compressedValue.Append(zVarchar.EscapeCharacter)
                        compressedValue.Append(lastValue)
                        compressedValue.Append(Chr(valueCount))
                    End If
                    valueCount = 0
                End If
                compressedValue.Append(zVarchar.EscapeCharacter, 2)
                compressedValue.Append(Chr(1))
            Else ‘ if we don’t have an escape character
                If valueCount = 0 Then
                    lastValue = currentValue
                    valueCount = 1
                Else ‘ could be a repeated character
                    If lastValue = currentValue Then
                        If valueCount > zVarchar.MaximumRepeatValue Then ‘ needs to still fit in a byte
                            compressedValue.Append(zVarchar.EscapeCharacter)
                            compressedValue.Append(lastValue)
                            compressedValue.Append(Chr(valueCount))
                            valueCount = 0
                        End If
                        valueCount += 1
                    Else ‘ different character
                        ‘ output the cached value first
                        If valueCount < 3 Then
                            compressedValue.Append(lastValue, valueCount)
                        Else
                            compressedValue.Append(zVarchar.EscapeCharacter)
                            compressedValue.Append(lastValue)
                            compressedValue.Append(Chr(valueCount))
                        End If
                        lastValue = currentValue
                        valueCount = 1
                    End If
                End If


            End If
        Next characterCounter
        If valueCount > 0 Then ‘ we have something cached still to output
            If valueCount < 3 Then
                compressedValue.Append(lastValue, valueCount)
            Else
                compressedValue.Append(zVarchar.EscapeCharacter)
                compressedValue.Append(lastValue)
                compressedValue.Append(Chr(valueCount))
            End If
        End If


        Return compressedValue.ToString()
    End Function


    Public Shared Function Decompress(ByVal ValueToDecompress As String) As String
        Dim decompressedValue As New StringBuilder
        Dim characterCounter As Integer = 0


        Do While characterCounter < ValueToDecompress.Length
            If CType(ValueToDecompress.Substring(characterCounter, 1), Char) = zVarchar.EscapeCharacter Then
                decompressedValue.Append(CType(ValueToDecompress.Substring(characterCounter + 1, 1), Char), Asc(ValueToDecompress.Substring(characterCounter + 2, 1)))
                characterCounter += 2
            Else
                decompressedValue.Append(ValueToDecompress.Substring(characterCounter, 1))
            End If
            characterCounter += 1
        Loop
        Return decompressedValue.ToString()
    End Function
End Class


 

2 thoughts on “zVarchar example

  1. This is the first query example I’ve seen.

    Seems like a shame Yukon isn’t going full OQL and allowing us to query object graphs. This seems halfway there, alleviating the need for Component/Embedded values, but if it could support relations in the underlying fields it’d be much closer to a full OODBMS.

    I suppose trying to hack it into a partial one through rows as UDT’s is discouraged?

Leave a Reply

Your email address will not be published. Required fields are marked *