ruby: [Matrix] Thoughts: Do not encourage bloated `initialize` methods

Could we have a discussion about this? I think typically this is bad behavior and it’s much better design to do the work when methods are called. This pattern can lead to all sorts of performance problems latter (doing too much work). It’s also a pretty serious violation of SRP.

Thoughts?

Originally filed against the matrix actual exercise: https://github.com/exercism/website-copy/pull/1051

[Edit maud] The current solution in the mentor notes is:

class Matrix
  attr_reader :rows, :columns
  def initialize(matrix_as_string)
    @rows = extract_rows(matrix_as_string)
    @columns = rows.transpose
  end

  private
  def extract_rows(string)
    string.each_line.map { |line| line.split.map(&:to_i) }
  end
end

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 18 (15 by maintainers)

Most upvoted comments

Let’s try to wrap this conversation up.

My conclusion from the discussion is that the Matrix solution with everything defined in the initializer, is maybe not per se wrong in this particular exercise, but approving an approach that can have repercussions in other situations is misleading for the student.
Matrix is a Level 2 exercise, and the discussion that is provoked by computing the rows and the columns in the initializer, is not appropriate for this level. There are other problems with this exercise as well. For instance, Ruby already has a Matrix class, and we don’t want to ask students to reimplement existing features (at least, not in core exercises).

All of the above leads to the conclusion that Matrix should be removed from the core progression. On the other hand, it offers some features that we badly need on Level 2:

a simple multiline string (or at least something else than a single line string)
private vs public
map with Symbol#to_proc
attr_reader
memoization

At the moment, there are no other exercises (that I know of) that can fill the gap if we remove Matrix from the core. Also, Matrix is one of only 2 exercises on Level 2 (Series being the other one, following Matrix). Therefore, I propose to leave the things as they are. And try to find (or create) one or more other exercises that can fill the gap that is currently filled by Matrix. Albeit in a controversial way.

Given all this, can you, @yyyc514, or anyone else not live with this solution temporarily? If not, please explain why.

Also, suggestions for other exercises that can replace Matrix are very welcome.

TL;DR The current Matrix exercise is problematic in multiple ways. Let’s find a replacement for the Matrix exercise as a core asap. In the meantime, approve both approaches: either calculating rows and columns in the initializer, or defining separate methods.

emcoding on Jun 26, 2019

One solution is to have a static constructor (factory) called something like from_string that does the work of extract_… such that the real constructor only does assignment.

I don’t think this is a great solution because it doesn’t address the performance problem. One can easily imagine a matrix with a variety of methods and I just want to load a bunch of matrixes from a text file and then print out only the LARGE ones. Imagine we had a slightly more parseable format (I just realized it’d have to use newlines different, but you still get the idea). IE:

100x100;raw_matrix_data_here_in_previous_format_we_already_familiar_with"

Now imagine I have a 100 million matrixes on disk. I want to write code like this:

@matrixs = File.open("input")
  .each_line.map { |x| Matrix.new(x) }
  .select {|x| x.width == 100 && x.height == 100 }
  .map(&:print)

Or you could easily imagine writing it to use less RAM… the point is if there is only a single matrix that is 100x100… other than the disk IO and the cost of width and height (I presume I’d have to parse the first part of the raw string) I can mostly ignore the other 99 million matrixes. They don’t need to be fully parsed, rows and columns don’t need to be computed at all.

This is of course lazy loading (and not a new concept), but I think you get this for “free” when you keep init small and push responsibilities down into their own methods.

joshgoebel on Jun 4, 2019

Do you happen to know a Ruby exercise where doing some work in the initializer is a good idea?

In the realm of Exercism exercises I can think of OTTOMH I think init should:

setup variables and initial object state
perhaps do some very simple input sanitization
and not much else

In general if 80% of your class’s work is happening inside the initializer you’re doing it all wrong. (ie, this Matrix example)

My background in Rails probably betrays me a little here, but I think even in Ruby proper these aren’t bad principals. Objects should be fast to create because there is great utility in working with objects as abstractions… and if you create slow to initialize objects then what you do is end up pushing logic that really belongs inside the object OUTSIDE the object so you can “work with the object” BEFORE creating the object - because the cost of creation is too high - and that path is the path to the dark side.

joshgoebel on Jun 10, 2019

@pgaspar wrote:

My thinking around this is still a bit fuzzy and the result is that I end up not mentoring Matrix as much as other exercises. […] It’s hard to easily demonstrate the mutability issue on the core exercises where people do this often because the tests usually don’t specify whether the objects should be mutable or not (thinking mainly of Matrix and HighScores).

This seems like another concern tangentially related to the internal representation.

And it also seems like a very interesting point of concern to raise, since Matrix is a core exercise.

Both because you say it’s not preferrable to mentor, and because it may simply be unclear.

@yyyc514 wrote:

Your suggestion has the upside [… lazy …], but the downside [… not abstract …]

I only skimmed, but not sure I follow this point. What I proposed doesn’t require a “specific serialization format”… you could imagine many potential formats… and still delaying the parsing until the actual underlying data was actually requested […]

OK, let me try and phrase it differently: There’s a premature coupling between any serialization format and object initialization: in this case a string from a text file, which could be some other string format, or a maybe even a filename from which to open and read the matrix, or an open file descriptor to such a file, or a network file descriptor, or a callback function.

Any input to a constructor that does not correspond to the internal, abstract representation of the class creates a coupling between instantiating objects and deserializing them. If the deserialization function is separate from the constructor, you can construct objects abstractly.

This premature coupling violates, as you said yourself, the Single-Responsibility Principle because the constructor does not construct objects, but prepares them for construction, and the getters don’t project objects, but deserializes them to some extent, and then projects them. It makes testing difficult, because you need to serialize a matrix before you can initialize it and test its properties.

I think your approach is ingenious in a highly specific use-case.

I didn’t realize I was treating [real work in constructors and lazy initialization] as a single concern. The point is about avoiding work we don’t need to do - regardless of what the internal representation looks like.

OK, then I misunderstood “bloated”.

I understand now that you only speak of avoiding computation.

This is certainly a necessary technique sometimes. For example, when sending out mass emails at my day job, we cannot use our standard, abstract URL object, because its initialization and internal representation is simply too expensive for millions of URLs in a very short time period. So we (sadly) use string concatenation in those cases. In dynamic languages, abstraction does not come for free.

@F3PiX suggested:

remove Matrix as a core exercise, because it raises questions that are not appropriate at this point in the track

add Transpose to the core, but to the end of the track

Find one or more easy exercises that cover the concepts that are currently covered (or: were supposed to be covered) by Matrix with regard to the Track Anatomy.

OK; I can’t tell what concepts are supposed to be covered, so I’ll leave the conversation at this point and wish you luck. 😃

sshine on Jun 7, 2019

Your suggestion has the upside that parsing is delayed, but the downside that the internal representation of a matrix is not abstract and depends on a specific serialization format; cf. what Misko Hevery calls “an inflexible and prematurely coupled design”. I won’t expand on this argument, as his blog post does so amply.

I only skimmed, but not sure I follow this point. What I proposed doesn’t require a “specific serialization format”… you could imagine many potential formats… and still delaying the parsing until the actual underlying data was actually requested - not just when the object was instantiated. I only gave a specific encoding format example to help explain how you could have fast access to some attributes of the matrix but not others.

My input was an offer to regard constructors that do real work, the title of your PR, as a separate concern than how to internally represent a matrix, which is a rabbit hole with no end, since the exercise does not specify any specific use-cases.

I didn’t realize I was treating them as a single concern. The point is about avoiding work we don’t need to do - regardless of what the internal representation looks like.

How do you know that a matrix is LARGE without accessing it abstractly? If you know it by the size of the string, wouldn’t you also know it by the file’s size without even having to read its input? This, I bet you, would be even faster! This, of course, assumes that they’re located in 100 million separate files with each their file size.

These are also potential possibilities for rough idea of size, but I presume the best way to know the EXACT size of the matrix is to ask it itself - not try to guess based on other factors. So yes, you’d instantiate it and ask it directly. And you could imagine designs where that required parsing the whole encoded format - but you could also imagine designs where just fetching the metadata (width, height, etc) was a very fast operation… and in those cases filtering by meta-data would be very efficient as I suggested. UNLESS it got bogged down by an init that was doing way more than absolutely necessary.

I have a personal snippet attempting to explain this for HighScores that I use when I run into an all-in-initializer solution. I haven’t used it often.

I’m going to borrow this or some variant. 😃

joshgoebel on Jun 5, 2019

bloated initialize methods

I’ll try to sell this: There’s a general object-oriented idiom that teaches that constructors shouldn’t do any computational work themselves. An example of this argument is found on Misko Hevery’s Testability Explorer Blog: Flaw: Constructor does Real Work.

This conflicts with the fact that a matrix is given as a string, but the internal representation is column/row-based, which is also sensical. @yyyc514 argues that it’s not the responsibility of the constructor to perform the conversion. On the other hand, it’s inefficient if the object doesn’t cache its representation of a matrix in a way that is efficient for its methods.

One solution is to have a static constructor (factory) called something like from_string that does the work of extract_... such that the real constructor only does assignment. I don’t know what Ruby-specific naming idioms apply for static constructors / factories. They’re quite common in Java and Haskell.

A solution might then look like:

class Matrix
  attr_reader :rows, :columns
  def initialize(rows, columns)
    @rows = rows
    @columns = columns
  end

  def self.from_string(matrix_as_string)
    rows = matrix_as_string.each_line.map { |line| line.split.map(&:to_i) }
    columns = rows.transpose
    Matrix.new(rows, columns)
  end
end

A secondary concern is whether to cache/memoize the column representation or compute it every time it’s returned. And as @F3PiX voices, it depends. I’d elaborate that it depends not just on how you see the class, but how you expect to manipulate its objects. For example, if you expect the matrix to be mutable and that you want to change the values of cells, having the same cell in two places will become a problem. As the exercise is contrived in the sense that it only has rows and columns properties, and no actual use-cases, there’s no right answer here.

It may be “preformance” optimization to memoize columns, or it may not.

So another solution might look like:

class Matrix
  attr_reader :rows
  def initialize(rows)
    @rows = rows
  end

  def columns
    rows.transpose
  end

  def self.from_string(matrix_as_string)
    rows = matrix_as_string.each_line.map { |line| line.split.map(&:to_i) }
    Matrix.new(rows)
  end
end

Either way, the concern of the constructor that does too much work is still valid.

If we keep the two concerns separate, perhaps we can reach a consensus faster. 😃

My impression(!) is that the Ruby community is moving towards doing more work in the initializer. I don’t know where that’s coming from, so if someone can confirm and/or enlighten us, that would be appreciated.

It’s hard for me to confirm or deny this.

But we should have plenty of rubyists on the track available?

The current Mentor Notes are not reflecting all the opinions listed above. If we find consensus on which are viable options, I volunteer to update the mentor notes accordingly.

Since there’s already a PR for this, exercism/website-copy#1051, perhaps this PR should just be updated with the changes around which this issue seeks to reach consensus?

sshine on Jun 4, 2019