Tag Archives: rails

Rails 4 Nested Select Query Using ActiveRecord

Just skip to the last part if you are in a rush 😉

Last year I did an upgrade for a legacy Rails website, from version 3.2 to 4.2. One daunting task was to upgrade the ActiveRecord RubyGem since the API changed drastically in the new release. Luckily there was a RubyGem built specifically for preserving the functionality using the deprecated API, at least for now. To prepare for the inevitable, I want to write down some of the changes made recently for future reference so I don’t have to search through the documentation and Stackoverflow again.

What’s the problem?

We have a method that returns a report that aggregates on some data. 7 years ago when this application was built, there wasn’t much data so this was fine but now the report won’t load for an obvious reason: the performance of the query to calculate col2 was so bad given the current data volume. Therefore, we have to optimize it.

def self.get_summary(params)
  params[:fields] = [
	  "column1 as col1",	
      "GROUP_CONCAT(DISTINCT(column2) ORDER BY colum2 SEPARATOR ',') as col2",
      "sum(column3) as col3"]
  args = parse_params(params)
  self.find(:all, :select => args[:select],
		    :condition => args[:conditions],
		    :group => 'col1',
		    :from => "#{self.table_name}")
end
Rails 3 query ActiveRecord style

The above Rails code translate to the following SQL query:

select 
    colum1 as col1,
    GROUP_CONCAT(DISTINCT(column2) ORDER BY column2 SEPARATOR ',') as col2,
	sum(column3) as col3
from table1
where lastUpdatedDate between '20XX-01-XX' and '20XX-02-XX'
AND (column3 <> 0)
GROUP BY col1;
Rails 3 query in SQL

What’s the solution?

We decided to use a nested query to group by both the col1 and column2 first and select from that result:

select 
    col1,
    GROUP_CONCAT(DISTINCT(column2) ORDER BY column2 SEPARATOR ',') as col2,
	sum(col3)
from (
	select 
		colum1 as col1,
		column2,
		sum(column3) as col3
	from table1
	where lastUpdatedDate between '20XX-01-XX' and '20XX-02-XX'
	AND (column3 <> 0)
	GROUP BY col1, column2;
) a
GROUP BY col1;
Optimized query

Works like a charm except that I’m having problem translate this back to Rails code since I haven’t worked with these Rails code for almost a year now. I did some research but couldn’t find any straight answers to my question so I had to switch to trial and error by composing different API combinations and comparing the logged query with the desired one.

It turned out this one page in the documentation has all you need but you need to be creative to combine them.

Here’s how

  • find all with :select can be replaced by the new .select() API.
  • :conditions can be replaced by .where() API.
  • :group can be replace by .group() API.
  • :from can be replaced by .from() API.

The nested part took me some time to figure out. It turned it’s just the .from() API.

def self.get_summary(params)
	# nested inner query fields.
	params[:fields] = [
	  "column1 as col1",	
      "column2",
      "sum(column3) as col3"]
	args = parse_params(params)
	nested_query_result = self.select(args[:select])
	                          .where(args[:conditions])
	                          .group('col1, column2')
	# outer query select columns.
	params[:fields] = [
		"col1",
		"GROUP_CONCAT(column2 ORDER BY column2 SEPARATOR ',') as col2",
		"sum(col3)"]
	args = parse_params(params)
	results = self.select(args[:select])
	              .from(nested_query_results)
	              .group('col1')
	return result
end

In summary, just do the normal select for the inner query and save the result into a variable inner_result. Then pass it to the outer query’s .from() API.