How to convert list of integers into dynamic regex pattern which exactly matches those list of integers?

How to convert list of integers into dynamic regex pattern which exactly matches those list of integers?
typescript
Ethan Jackson

Body:

I'm trying to dynamically generate a compact regular expression in Ruby based on a list of integers.

For example, given this input list:

[1,2,3,4,5,6,7,9,10,11,12,13,14,15,17,18,19,20,21,22,23,25,26,27,28,29,30,31] I would like to produce a regex like:

/^([1-7]|9|1[0-5]|1[7-9]|2[0-3]|2[5-9]|3[0-1])$/ This regex should match exactly the numbers in the list, in the most compact form possible — for example:

1-7 gets compacted into [1-7] 10-15 into 1[0-5] 17-19 into 1[7-9]

and so on.

What I have tried:

I wrote the following Ruby code:

def number_list_to_ranges(numbers) numbers.sort! ranges = [] start = numbers.first prev = numbers.first numbers[1..].each do |n| if n == prev + 1 prev = n else ranges << (start..prev) start = n prev = n end end ranges << (start..prev) end def range_to_regex(r) return r.begin.to_s if r.begin == r.end if r.begin >= 0 && r.end <= 9 "[#{r.begin}-#{r.end}]" elsif r.begin >= 10 && r.end <= 99 subranges = [] (r.begin..r.end).each do |n| subranges << n end grouped = number_list_to_ranges(subranges) grouped.map do |subr| if subr.begin == subr.end subr.begin.to_s elsif subr.begin / 10 == subr.end / 10 tens = subr.begin / 10 units_start = subr.begin % 10 units_end = subr.end % 10 "#{tens}[#{units_start}-#{units_end}]" else subr.map(&:to_s).join('|') end end.join('|') else (r.begin..r.end).map(&:to_s).join('|') end end def generate_regex(numbers) ranges = number_list_to_ranges(numbers.uniq) parts = ranges.map { |r| range_to_regex(r) } "/^(" + parts.join('|') + ")$/" end nums = [1,2,3,4,5,6,7,9,10,11,12,13,14,15,17,18,19,20,21,22,23,25,26,27,28,29,30,31] puts generate_regex(nums)

Problem:

This code does not correctly compact the list into the desired compact regex form. Instead, it just prints a verbose list like:

/^(1|2|3|4|5|6|7|9|10|11|12|13|14|15|17|18|19|20|21|22|23|25|26|27|28|29|30|31)$/ It doesn't group them into [1-7], 1[0-5], 1[7-9], etc.

Question:

How can I modify or improve this Ruby code to properly generate a compact regex from a list of integers?

Preferably:

Group continuous ranges into [start-end]

Handle tens nicely (e.g., 10-15 → 1[0-5])

Keep it readable and efficient

Any suggestions or better approaches?

Answer

Okay i made that work with custom logic. Following code is working perfectly considering following:

  1. This approach works well for numbers up to 99.

  2. For numbers >99, it would need extension (e.g., 100–199 logic).

  3. You could further optimize to collapse cross-tens boundaries if needed (advanced).

def generate_regex_parts(x) length = x.length start = x[0] ending = x[-1] optimized = [] (0...length).each do |n| q = x[n] / 10 r = x[n] % 10 if start == ending optimized << "#{x[n]}" break end if q == 0 if (x[n+1] - x[n] ) != 1 || (x[n + 1] / 10 ) != q optimized << "[#{start % 10}-#{r}]" start = x[n + 1] end else if (x[n+1] - x[n] ) != 1 || (x[n + 1] / 10 ) != q optimized << "#{q}[#{start % 10}-#{r}]" start = x[n + 1] elsif (x[n + 1] ) == ending && (x[n + 1] / 10 ) == q optimized << "#{q}[#{start % 10}-#{x[n + 1] % 10}]" break end end return optimized end

Related Articles