How to convert list of integers into dynamic regex pattern which exactly matches those list of integers?

How to convert list of integers into dynamic regex pattern which exactly matches those list of integers?

Body:

I'm trying to dynamically generate a compact regular expression in Ruby based on a list of integers.

For example, given this input list:

[1,2,3,4,5,6,7,9,10,11,12,13,14,15,17,18,19,20,21,22,23,25,26,27,28,29,30,31] I would like to produce a regex like:

/^([1-7]|9|1[0-5]|1[7-9]|2[0-3]|2[5-9]|3[0-1])$/ This regex should match exactly the numbers in the list, in the most compact form possible — for example:

1-7 gets compacted into [1-7]

10-15 into 1[0-5]

17-19 into 1[7-9]

and so on.

What I have tried:

I wrote the following Ruby code:

def number_list_to_ranges(numbers)
  numbers.sort!
  ranges = []
  start = numbers.first
  prev = numbers.first

  numbers[1..].each do |n|
    if n == prev + 1
      prev = n
    else
      ranges << (start..prev)
      start = n
      prev = n
    end
  end
  ranges << (start..prev)
end

def range_to_regex(r)
  return r.begin.to_s if r.begin == r.end

  if r.begin >= 0 && r.end <= 9
    "[#{r.begin}-#{r.end}]"
  elsif r.begin >= 10 && r.end <= 99
    subranges = []
    (r.begin..r.end).each do |n|
      subranges << n
    end

    grouped = number_list_to_ranges(subranges)
    grouped.map do |subr|
      if subr.begin == subr.end
        subr.begin.to_s
      elsif subr.begin / 10 == subr.end / 10
        tens = subr.begin / 10
        units_start = subr.begin % 10
        units_end = subr.end % 10
        "#{tens}[#{units_start}-#{units_end}]"
      else
        subr.map(&:to_s).join('|')
      end
    end.join('|')
  else
    (r.begin..r.end).map(&:to_s).join('|')
  end
end

def generate_regex(numbers)
  ranges = number_list_to_ranges(numbers.uniq)
  parts = ranges.map { |r| range_to_regex(r) }
  "/^(" + parts.join('|') + ")$/"
end



nums = [1,2,3,4,5,6,7,9,10,11,12,13,14,15,17,18,19,20,21,22,23,25,26,27,28,29,30,31]
puts generate_regex(nums)

Problem:

This code does not correctly compact the list into the desired compact regex form. Instead, it just prints a verbose list like:

/^(1|2|3|4|5|6|7|9|10|11|12|13|14|15|17|18|19|20|21|22|23|25|26|27|28|29|30|31)$/ It doesn't group them into [1-7], 1[0-5], 1[7-9], etc.

Question:

How can I modify or improve this Ruby code to properly generate a compact regex from a list of integers?

Preferably:

Group continuous ranges into [start-end]

Handle tens nicely (e.g., 10-15 → 1[0-5])

Keep it readable and efficient

Any suggestions or better approaches?

Answer

Okay i made that work with custom logic. Following code is working perfectly considering following:

  1. This approach works well for numbers up to 99.

  2. For numbers >99, it would need extension (e.g., 100–199 logic).

  3. You could further optimize to collapse cross-tens boundaries if needed (advanced).

def generate_regex_parts(x)
  length = x.length
  start = x[0]
  ending = x[-1]
  optimized = []

  (0...length).each do |n|
    q = x[n] / 10
    r = x[n] % 10
    if start == ending
      optimized << "#{x[n]}"
      break
    end

    if q == 0
      if (x[n+1] - x[n] ) != 1 || (x[n + 1] / 10 ) != q
        optimized << "[#{start % 10}-#{r}]"
        start = x[n + 1]
      end
    else
      if (x[n+1] - x[n] ) != 1 || (x[n + 1] / 10 ) != q
        optimized << "#{q}[#{start % 10}-#{r}]"
        start = x[n + 1]
      elsif (x[n + 1] ) == ending && (x[n + 1] / 10 ) == q
        optimized << "#{q}[#{start % 10}-#{x[n + 1] % 10}]"
        break
      end
    end
        return optimized
  end

Enjoyed this article?

Check out more content on our blog or follow us on social media.

Browse more articles