Published 2026-03-23
As part of migrating my blog from Python to Go, I wanted to gather anonymized statistics about which countries my visitors were coming from. I quickly selected IPinfo as my GeoIP database of choice. They approximate the location of IP-addresses by pinging them from servers spread out in the world, triangulating locations by response time.
They also provide a ipinfo.csv.gz file for download that is updated daily. This was great for me as I wanted a local in-memory database; I didn't want to rely on sending GeoIP lookups to an external API-service to get the visitor's country. This is important as the country is determined as the requested page is served; waiting for external API calls would make the page load slower.
This "small" feature ended up requiring a bit of optimization due to the number of the entries in the CSV file. The process of optimizing turned into quite an adventure for me and I would like to share the process with you. Not to spoil anything but I make some pretty impressive optimizations, if I may say so myself.
Starting with some numbers, the ipinfo.csv.gz available for download is about 30MB. The unpacked .csv file is ~445MB. The file contains around ~5,7 million lines where each line contains one subnet and its corresponding country. The format of the CSV looks something like this:
network,country,country_code,continent,continent_code,asn,as_name,as_domain1.0.0.0/24,Australia,AU,Oceania,OC,AS13335,"Cloudflare, Inc.",cloudflare.com1.0.1.0/24,China,CN,Asia,AS,,,1.0.2.0/23,China,CN,Asia,AS,,,1.0.4.0/22,Australia,AU,Oceania,OC,AS38803,Gtelecom Pty Ltd,gtelecom.com.au1.0.8.0/21,China,CN,Asia,AS,,,1.0.16.0/24,Japan,JP,Asia,AS,AS2519,ARTERIA Networks Corporation,arteria-net.com1.0.17.0/24,Japan,JP,Asia,AS,,,1.0.18.0/23,Japan,JP,Asia,AS,,,1.0.20.0/22,Japan,JP,Asia,AS,,,I only care about the network, country name (and possibly country code) columns. Also, my Internet provider only provide IPv4, so I only have to keep the ~1,3 million IPv4 entries. How do I get this data into my Go program, and how do I make the search fast? Let's tackle these problems one at a time.
This was my initial implementation. All data lives inside the CountryIPData struct that is loaded on program startup. The struct has two methods. The parseIPInfoCSV() method run on struct initialization and put all IPv4 subnets in a map[netip.Prefix]string where the string is the country name. The AddrCountry() method perform the ip-to-country lookup.
This are the relevant parts of the code:
type CountryIPData struct { subnetCountries map[netip.Prefix]string} func (c *CountryIPData) parseIPInfoCSV(ipInfoCSV *os.File) error { subnetCountries := make(map[netip.Prefix]string, 1_500_000) scanner := bufio.NewScanner(ipInfoCSV) for scanner.Scan() { fields := strings.Split(scanner.Text(), ",") prefix := fields[0] subnet, err := netip.ParsePrefix(prefix) if err != nil { if fields[0] == "network" { continue // first line contains CSV headers, first header is "network" } return fmt.Errorf("line %s: %v", scanner.Text(), err) } else if subnet.Addr().Is6() { break // We only care about ipv4 subnets, they are displayed first } country := fields[1] subnetCountries[subnet] = country } c.subnetCountries = subnetCountries return nil} func (c *CountryIPData) AddrCountry(ip netip.Addr) (country string) { for subnet, country := range c.subnetCountries { if subnet.Contains(ip) { return country } } return ""}A relatively straightforward implementation. Turning each subnet into a netip.Prefix gives me access to the convenient Contains() method, making the code easy to follow.
I measured the memory usage and average lookup time using this benchmark function:
func BenchmarkIPLookup(b *testing.B) { countryIP, err := NewCountryIPData() if err != nil { fmt.Printf("new CountryIPData: %v\n", err) return } runtime.GC() var m runtime.MemStats runtime.ReadMemStats(&m) alloc := m.Alloc b.Logf( "countryIP: %d MB (%d bytes per entry)\n", alloc/1024/1024, alloc/uint64(countryIP.Length()), ) octet := 0 for b.Loop() { // 0.0.0.0, 1.1.1.1, 2.2.2.2, 3.3.3.3, etc countryIP.AddrCountry(fmt.Sprintf("%d.%d.%d.%d", octet, octet, octet, octet)) octet++ if octet > 239 { octet = 0 } } b.Log("avg time per lookup:", b.Elapsed()/time.Duration(b.N))}Checking the map memory usage, I found that the 1,3M+ entries use ~223MB memory. That is 174 bytes per entry, on average. Considering that the .csv file itself is ~445MB we have saved some space, but I expected the memory usage to be less considering we are only processing IPv4 entries, not the full file.
Since CountryIPData is the only variable we load in the benchmark, we can assume it's accounting for most of the program's memory usage. It's not a perfect method, but gives a good baseline. It also helps to manually run the garbage collector (runtime.GC()) as it clears any unused data.
Benchmarking AddrCountry() gives an average result of ~10 milliseconds per lookup, which is fine but not great. I would probably get a similar result by querying the IP-info API. To get a fair result, I evenly spread out the IP-address lookups in the full IPv4 unicast range. Because version 1 uses a map the prefix order is always randomized, so this spread helps keep the benchmark results consistent.
So, we have a way to test our different implementations. Version 1 works, but we can definitely do better. Where to start?
I chose to first optimize the 10ms lookup time. Since I'm doing the lookup as part of serving each page, it makes the page load that much slower. The slowness comes from the fact that each lookup has to potentially go through all 1,3M+ entries before a match is found.
After some thinking, I figured I could split the map[netip.Prefix]string into a nested map where the outer map groups prefixes based on the first byte of the IP-address: map[byte]map[netip.Prefix]string. This separates the data into ~240 nested maps. The benefit being that each nested map contains much fewer prefixes.
This is what the v2 code looks like:
type CountryIPData struct { subnetCountriesByFirstByte map[byte]map[netip.Prefix]string} func (c *CountryIPData) parseIPInfoCSV(ipInfoCSV *os.File) error { scanner := bufio.NewScanner(ipInfoCSV) for scanner.Scan() { fields := strings.Split(scanner.Text(), ",") prefix := fields[0] subnet, err := netip.ParsePrefix(prefix) if err != nil { if fields[0] == "network" { continue // first line contains CSV headers, first header is "network" } return fmt.Errorf("line %s: %v", scanner.Text(), err) } else if subnet.Addr().Is6() { break // We only care about ipv4 subnets } // Create nested map if not exist firstByte := subnet.Addr().As4()[0] if _, exist := c.subnetCountriesByFirstByte[firstByte]; !exist { c.subnetCountriesByFirstByte[firstByte] = make(map[netip.Prefix]string) } // Add prefix/country to nested map country := fields[1] c.subnetCountriesByFirstByte[firstByte][subnet] = country } return nil}The CSV-parser now places each subnet into a nested map based on the first byte. This extra map-layer increases the memory usage from ~223MB to ~227MB. This is not optimal, but will do for now.
This is how the new IP lookup does its job:
func (c *CountryIPData) AddrCountry(ip netip.Addr) (country string) { firstByte := uint8(addr.As4()[0]) for firstByte > 0 { subnetCountries, exist := c.subnetCountriesByFirstByte[firstByte] if !exist { firstByte -= 1 continue } for subnet, country := range subnetCountries { if subnet.Contains(ip) { return country } } return "" } return ""}The AddrCountry() lookup got more complicated as the first byte of the user IP-address might not perfectly match an entry. Funnily enough, the only instance of this is the 28.0.0.0/7 IP-range that is assigned to the United States Department of Defense. The last IP-address in that range is 29.255.255.255.
If a user visits with that IP-address, the lookup searches for key 29 in the outer map, but no such key exist. This is why the lookup occur inside a for loop; if key does not exist, firstByte is decremented until a match is found. The second iteration of the loop will therefore try key 28 and find a matching key. Its value is a nested map[netip.Prefix]string. The lookup can now proceed like it did in version 1, returning the subnet's country name.
The lookup time now averages 46 microseconds. Compared to the 10 milliseconds achieved by version 1, it's a massive 217x gain in performance! This is thanks to each nested map contains somewhere between 1k and 50k instead of 1M+. Performing a lookup on page serve is now so fast that it's practically unnoticable.
But I'm not yet satisfied. Is the 227MB memory usage a lot, or is this what I have to expect from keeping so many entries in memory?
I wanted lower the memory usage so I focused on the value of each key, the country name. For every subnet we are storing the country as a string. With 248 countries in the file and 1,3M+ entries, each country's name is duplicated over 5000 times on average. I tested the impact of this by leaving all strings empty. The memory usage dropped from 227MB to 116MB, so there is potential to save on memory usage.
I finally settled on storing the country name strings in a separate map[countryCode]countryName where the countryCode key is a [2]byte byte-array (CN for China, for example). I also had to alter the old subnetCountries map to instead store the 2-byte country code as its value. A separate map lookup on countriesByCC is now required to get the full country name.
This is the new code. Some error handling had to be cut for example brevity:
type CountryIPData struct { subnetCountryCodes map[byte]map[netip.Prefix][2]byte countriesByCC map[[2]byte]string // map[countryCode]countryName} func (c *CountryIPData) parseIPInfoCSV(ipInfoCSV *os.File) error { var countryCode [2]byte scanner := bufio.NewScanner(ipInfoCSV) for scanner.Scan() { fields := strings.Split(scanner.Text(), ",") subnet, _ := netip.ParsePrefix(prefix) // error skipped for brevity // Populate country in separate map copy(countryCode[:], fields[2]) if _, exist := c.countriesByCC[countryCode]; !exist { country := fields[1] c.countriesByCC[countryCode] = country } // Populate subnet entries with countrycode value firstByte := subnet.Addr().As4()[0] if _, exist := c.subnetCountryCodes[firstByte]; !exist { c.subnetCountryCodes[firstByte] = make(map[netip.Prefix][2]byte) } c.subnetCountryCodes[firstByte][subnet] = countryCode } return nil} func (c *CountryIPData) AddrCountry(ip netip.Addr) (country string) { firstByte := uint8(addr.As4()[0]) for firstByte > 0 { subnetCountries, exist := c.subnetCountryCodes[firstByte] if !exist { firstByte -= 1 continue } for subnet, countryCode := range subnetCountries { if subnet.Contains(addr) { country := c.countriesByCC[countryCode] // Extra map lookup return country } } return "" } return ""}So, fewer strings but an extra key lookup. What's the verdict? Well, memory usage dropped from ~227MB to ~100MB. That is an average of 78 bytes per entry, compared to 177 bytes in the previous version. This is even better than my previous test using empty strings. I'm guessing that this is because each string adds some overhead that a byte-array does not? The lookup search time now averages 47 microseconds. So this is yet another great improvement! We halved the memory usage and the average lookup only took one microsecond longer. But can we do better? Yes!
I have to admit that I am very proud of this version, making some fantastic gains and improvements. Based on the string optimizations in v3, we can deduce that most of the 100MB memory left in use is now taken up by the netip.Prefix. Each entry takes 78 bytes of memory on average. An IPv4-address is 4 bytes and the countryCode is another 2 bytes, so where do the rest of the memory go?
Unaware of more sciency options, I figured that instead of using a netip.Prefix and its convenient Contains() method, I could just convert the first and last IP-address of each subnet into separate uint32's. I can then compare the user IP to the first and last address of each subnet to check for a match; the user IP must be equal to or larger than the first address of the subnet while also being smaller or equal to the last address. If true, it's a match.
According to my counting, each entry would store 10 bytes of data. My Go LSP tells me it's 12 bytes, so there may be some alignment going on. But still, 12 bytes for an entry is much less than our current 78. That is promising.
This is how I implemented v3:
type CountryIPData struct { subnetCountries []IPv4SubnetCountry // now a slice countriesByCC map[[2]byte]string} type IPv4SubnetCountry struct { netAddr, lastAddr uint32 // 8 bytes countryCode [2]byte // 4 bytes} func (c *CountryIPData) parseIPInfoCSV(ipInfoCSV *os.File) error { var ( subnetCountries = make([]IPv4SubnetCountry, 0, 1_500_000) scanner = bufio.NewScanner(ipInfoCSV) countryCode [2]byte ) for scanner.Scan() { fields := strings.Split(scanner.Text(), ",") subnet, _ := netip.ParsePrefix(fields[0]) // error skipped for brevity var netAddrBytes [4]byte = subnet.Addr().As4() copy(countryCode[:], fields[2]) countryName := fields[1] c.countriesByCC[countryCode] = countryName subnetCountries = append(subnetCountries, IPv4SubnetCountry{ netAddr: binary.BigEndian.Uint32(netAddrBytes[:]), lastAddr: netAddr + 1<<(32-subnet.Bits()) - 1, countryCode: countryCode, }) } // Create a new slice containing the exact number of entries subnets := make([]IPv4SubnetCountry, len(subnetCountries)) copy(subnets, subnetCountries) // Sort entries in ascending order by the netAddr field slices.SortFunc(subnets, func(a, b IPv4SubnetCountry) int { return int(a.netAddr - b.netAddr) }) c.subnetCountries = subnets return nil}We can't convert a netip.Prefix directly into a uint32, it has to first be converted into a [4]byte. In this example the subnet is 1.0.1.0/24. The network address 1.0.1.0 is converted to uint32 16777472.
To figure out the last address of the 1.0.1.0/24 subnet in the above code, we first have to figure out the size of the subnet. We do that by subtracting subnet.Bits() from 32: 32 - 24 = 8. Our subnet is 8 bits in size.
To convert 8 bits into a decimal number, we bit-shift decimal 1 to the left eight times. Each bit-shift perform a "times 2" operation, giving this result: 1 << 8 = 256. This is what the bit-shifts look like in binary format:
000000001 = initial value (integer 1)000000010 = 1st iteration (integer 2)000000100 = 2nd iteration (integer 4)000001000 = 3nd iteration (integer 8)000010000 = 4th iteration (integer 16)000100000 = 5th iteration (integer 32)001000000 = 6th iteration (integer 64)010000000 = 7th iteration (integer 128)100000000 = 8th iteration (integer 256)We have figured out that the subnet contains 256 unique IP-addresses. With the network address being one of these addresses, we have to subtract 1 from 256. The last IP-address of the subnet is therefore 1.0.1.0 + 255 = 1.0.1.255.
Compressing all of this into a single line turns it into this:
lastAddr := netAddr + 1<<(32-subnet.Bits()) - 1Now that both network-address and last-address of the subnet have been calculated, we can put them in a struct that also contain the country-code. All structs are sorted and stored in a slice ready for IP-lookups performed by the AddrCountry() method.
Bit-shifting is the fastest operations a CPU can perform. This is because it's one of the simplest instructions for a CPU to execute. Comparatively, muliplication and division are some of the most expensive operations a CPU can perform due how to complex the instructions are. To divide a number in half, a nice performance trick is to bit-shift the variable one step to the right: 30 >> 1 = 15. The same is true for doubling, just bit-shift the value one bit to the left: 15 << 1 = 30.
Now that the subnetCountries map has changed to a slice, I should probably attempt to improve the IP lookup process. My immediate thought was to loop through all subnet entries in the slice until a match was found. But with more than a million entries, there must be a more efficient way.
This is when I remember reading about the binary search algorithm. The name sounded scary but I figured I would check it out anyway. To my surprise it looked relatively intuitive!
To explain the algo, imagine you are holding a book; you want to open page 60 in the book, but you don't know where that page is. The algorithm starts by opening the book right in the middle. In a 100-page book, this opens page 50. We now know that page 60 is in the upper half of the book, so we can ignore the first 50 pages. At this first step we have already ruled out 50% of the pages in the book.
We run the algorithm again, but this time opening the page that is halfway between pages 50 and 100. We get page 75. As page 60 is lower than 75, we can now ignore higher page numbers. Running the algo again, we open the page halfway between 50 and 75. We get page 63. We run it again for 50 and 63 and we get page 57. Run it again for 57 and 63 and we get page 60!
With each iteration halving the size of the search area, we can efficiently search through any long list of entries. The only caveat is that the entries must be sorted. In my case, I have to make sure the entries are sorted by the netAddr value.
The code below is my implementation of this binary search:
func (c *CountryIPData) AddrCountry(ip netip.Addr) (country string) { var ( addrBytes = ip.As4() addrInt = binary.BigEndian.Uint32(addrBytes[:]) low = 0 needle = len(c.subnetCountries) >> 1 // halfway point high = len(c.subnetCountries) ) for range 30 { subnet := c.subnetCountries[needle] if high-low == 1 && low > 0 { return "" // No country matching, RFC1918 } else if addrInt < subnet.netAddr { high = needle needle = (low + high) >> 1 continue } else if addrInt > subnet.lastAddr { low = needle needle = (low + high) >> 1 continue } return c.countriesByCC[subnet.countryCode] } return ""}At the start of the function, we set the low, needle, high variables. Low is the first entry in the slice, high is the last entry in the slice. Needle is the entry right in the middle of the slice.
Each iteration start by fetching the IPv4SubnetCountry struct at the needle index. If our IP-address is lower than the subnet's network address we know that we need to search in the lower parts of the slice. We then make high = needle and then pick a new needle index based on the new low/high values: needle = (low+high) >> 1.
Likewise, if our IP-address is higher than the subnets last address we know we need to search the upper parts of the slice, so we make low = needle, then calculate needle to be the new halfway point.
This process continues until a matching subnet is found, at which point we do a countryCode-to-countryName lookup and return the result. If no matching subnet is found, the algorithm continue iterating until the low and high values are only one digit apart. This happen if traffic belong to a private IP-address (RFC1918) or its IP-range has not been assigned a country.
With 1,3M entries in the slice, the algorithm needs about 20-21 iterations to find a match. I manually set the maximum for-loop iterations to 30, just to avoid it running forever in case of some unforeseen bug.
What's the performance verdict? The previous version achieved a memory usage of 100MB and a lookup time of ~47 microseconds. Version 4 achieved a memory usage of 15MB and a lookup time of ~290 nanoseconds (0.29 microseconds)! This is another 162x speedup and I'm honestly quite proud of myself for implementing this.
I was perfectly satisfied with the v4 implementation and had almost finished writing this article when I realized that there exist one more optimization. If you looked carefully at the example output from the ipinfo.csv file, you could see that multiple adjacent subnets were assigned to the same country. This is a common pattern throughout the entire file.
Due to how subnetting works, these had to be specified as separate subnets in the CSV file. As I am now using integers to represent first/last IP-addresses in each range, I can easily extend one range to also include the next as long as they are adjacent. And the next one after that. So let's make one final optimization at the end of the parseIPInfoCSV() function:
func (c *CountryIPData) parseIPInfoCSV(ipInfoCSV *os.File) error { [...] slices.SortFunc(subnetCountries, func(a, b IPv4SubnetCountry) int { return int(a.netAddr - b.netAddr) }) // Countries tend to use IP-ranges after each other. // Combining adjacent ranges further reduce the number of entries combinedSubnets := make([]IPv4SubnetCountry, 0, len(subnetCountries)) combIdx := -1 for i := range subnetCountries { if i == 0 { combinedSubnets = append(combinedSubnets, subnetCountries[i]) combIdx++ continue } this := subnetCountries[i] prev := subnetCountries[i-1] if prev.countryCode != this.countryCode || prev.lastAddr+1 != this.netAddr { // Different countries or ranges not adjacent combinedSubnets = append(combinedSubnets, this) combIdx++ continue } // extend existing IP-range instead of adding another entry combinedSubnets[combIdx].lastAddr = this.lastAddr } copyCombined := make([]IPv4SubnetCountry, len(combinedSubnets)) copy(copyCombined, combinedSubnets) c.SubnetCountries = copyCombined return nilThis optimization more than halved the number of entries from 1,34M to 607K entries. The memory usage also halved from ~15MB to ~7MB.
By iterating through various solutions for optimizing the memory usage and lookup-time of my in-memory GeoIP database, we got the memory usage down from ~223MB in v1 to ~7MB in v5. The lookup time plummeted from ~10 milliseconds in v1 to ~290 nanoseconds in v5.
Honestly, this was a super fun exercise for me. I usually don't have to think about these sorts of optimizations in my daily work, so it was a great learning experience. Now that I have gotten the memory usage way down, I am comfortable to (in the future) add the 4M+ IPv6 entries that are currently skipped. This will happen the day I get an Internet-Provider that support IPv6.
If you want to look at the different versions in full, feel free to check them out here: https://github.com/emieli/go-country-ip.
Finally, I hope you enjoyed reading this. If you did, I thank you for your time and hope to see you again soon.