Reverse Engineer Unknown Data Format Using Elixir

Reverse Engineer Unknown Data Format Using Elixir

Table of Contents

I recently worked on porting the csv file converter for AdventureWorks database from ruby to elixir. This ended successful and I have all the data in my local postgresql to play with. By browsing the tables I found spatial_location column in person.address table and I wanted to decode it.

|address_line1       |city   |spatial_location                            |
|--------------------|-------|-------------------------------------------|
|1970 Napa Ct.       |Bothell|E6100000010CAE8BFC28BCE4474067A89189898A5EC0|
|9833 Mt. Dias Blv.  |Bothell|E6100000010CD6FA851AE6D74740BC262A0A03905EC0|
|7484 Roundtree Drive|Bothell|E6100000010C18E304C4ADE14740DA930C7893915EC0|
|9539 Glenside Dr    |Bothell|E6100000010C813A0D5F9FDE474011A5C28A7C955EC0|
|1226 Shoe St.       |Bothell|E6100000010C61C64D8ABBD94740C460EA3FD8855EC0|
|1399 Firestone Drive|Bothell|E6100000010CE0B4E50458DA47402F12A5F80C975EC0|
|5672 Hale Dr.       |Bothell|E6100000010C18E304C4ADE1474011A5C28A7C955EC0|
|6387 Scenic Avenue  |Bothell|E6100000010C0029A5D93BDF4740E248962FD5975EC0|
|8713 Yosemite Ct.   |Bothell|E6100000010C6A80AD742DDC4740851574F7198C5EC0|

As you can see the data looks like binary data. I found even some documentation from microsoft that may be useful here but in the end I could not use it with luck. So what I end up doing to decode it.

First what I know and may be useful:

  • I got the city and address
  • I know that the data is binary encoded as string because there is nothing higher than F.
  • The string is 44 letters long - every byte is encoded using 2 letters so I have 22 bytes of data

Let’s jump to iex:

iex(1)> encoded = "E6100000010C6A80AD742DDC4740851574F7198C5EC0"
"E6100000010C6A80AD742DDC4740851574F7198C5EC0"
iex(2)> String.length(encoded)
44
iex(3)> decoded = Base.decode16!(encoded)
<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
  25, 140, 94, 192>>

Next thing that I did was to find what exactly can be hidden in my data: The beginning and end look the same between the first lines I found - it may be some kind of wrapper and I may need to skip it. I also searched for the actual coordinates of Bothell to see if I can see some similarities:

Bothell coordinates

It looks like the data encoded is around 47.5 and -122.5 so let’s move back to iex.

iex(3)> decoded = Base.decode16!(encoded)
<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
  25, 140, 94, 192>>
iex(4)> <<47.0::float>>
<<64, 71, 128, 0, 0, 0, 0, 0>>
iex(5)> <<47.99::float>>
<<64, 71, 254, 184, 81, 235, 133, 31>>

I tried how float 47 looks like in binary form. It starts with 64, 71 and has 8 bytes. Looking at the encoded data I have 64 and even 71 - just in reverse order, is it a coincidence?? Let’s try with -122

<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
  25, 140, 94, 192>>
iex(7)> <<-122.0::float>>                
<<192, 94, 128, 0, 0, 0, 0, 0>>
iex(8)> <<-122.9::float>>
<<192, 94, 185, 153, 153, 153, 153, 154>>

192, 94 - again I can see similar pattern but in reverse order at the end. It turns out that I can use the little endian modifier to create similar looking data:

iex(9)> <<-122.9::little-float>>
<<154, 153, 153, 153, 153, 185, 94, 192>>

Hmm - definitely looks similar. Especially that it’s 8 numbers from the end and If I look at the encoded data then on the 9th position I have 64 which is the possible beginning of 47.0 encoded as binary. When removing this data I end up with 6 bits at the beginning. Back to iex:

iex(10)> <<_::binary-size(6), x::little-float, y::little-float>> = decoded
<<230, 16, 0, 0, 1, 12, 106, 128, 173, 116, 45, 220, 71, 64, 133, 21, 116, 247,
  25, 140, 94, 192>>
iex(11)> x
47.7201372000862
iex(12)> y
-122.189084876407

Was it really so easy?? Close but I can’t see the street 8713 Yosemite Ct.

First location on map

Let’s try with another one: 6387 Scenic Avenue, Bothell

iex(15)> <<_::binary-size(6), x::little-float, y::little-float>> = 
iex(..)> Base.decode16!("E6100000010C0029A5D93BDF4740E248962FD5975EC0")  
<<230, 16, 0, 0, 1, 12, 0, 41, 165, 217, 59, 223, 71, 64, 226, 72, 150, 47, 213,
  151, 94, 192>>
iex(16)> x
47.7440139824339
iex(17)> y
-122.372386833918

Second location on map

Again it’s not a complete hit. Let’s see if we can be more specific? From the ms docs I found that the encoded data starts with SRID

SRID (4 bytes): (32 bit integer) The SRID for the geography. GEOGRAPHY structures MUST use
SRID values in the range of 4120 through 4999, inclusive, with the exception of null geographies.
A value of -1 indicates a null geography. When a null geography is indicated, all other fields are
omitted. Default SRID for GEOGRAPHY instances is 4326. Default SRID for GEOMETRY instances is
zero (0). For GEOMETRY instance, SRID can be any value: SRID is not constrained.

Let’s again try to decode it, this time it’s an integer.

iex(35)> <<srid::little-integer-size(32), _::binary>> = Base.decode16!("E6100000010C0029A5D93BDF4740E248962FD5975EC0")
<<230, 16, 0, 0, 1, 12, 0, 41, 165, 217, 59, 223, 71, 64, 226, 72, 150, 47, 213,
  151, 94, 192>>
iex(36)> srid
4326

This is in the 4120..4999 range and after short google search it turns out that it’s a standard encoding that is used by openstreetmap. Let’s check how far are we off at least?

No result

Ohh - maybe that’s the problem? Let’s try another one:

No result

:facepalm::facepalm::facepalm::facepalm::facepalm::facepalm::facepalm:

It turns out that the data is synthetic so the exact location won’t be shown. But it does not mean that it was not worth checking out how to do this. I definitely learned something along the way and I hope so are you when reading it.

comments powered by Disqus

Related Posts

Livebook on Steam Deck

Livebook on Steam Deck

This post may be updated in the future.

Steam deck is a modern machine. I wanted to test how long it takes to generate images using stable diffusion on it. Thanks to livebook this becomes trivial. The biggest problem is to install the package. I did it using ssh but it can also be done directly on deck.

Read More
SQL Commenter with Postgrex

SQL Commenter with Postgrex

A few years ago, I discovered sqlcommenter, a tool that enables adding trace context to SQL queries. This feature makes it possible to match database calls visible in PostgreSQL logs or any observability tool. At the time, I found several questions about this topic scattered across the internet, but no concrete solutions. I attempted to solve this myself and experimented with various approaches, but none of them proved viable for implementation in Elixir, particularly with Ecto.

Read More