ruby: [Matrix] Thoughts: Do not encourage bloated `initialize` methods
Could we have a discussion about this? I think typically this is bad behavior and it’s much better design to do the work when methods are called. This pattern can lead to all sorts of performance problems latter (doing too much work). It’s also a pretty serious violation of SRP.
Thoughts?
Originally filed against the matrix actual exercise: https://github.com/exercism/website-copy/pull/1051
[Edit maud] The current solution in the mentor notes is:
class Matrix
attr_reader :rows, :columns
def initialize(matrix_as_string)
@rows = extract_rows(matrix_as_string)
@columns = rows.transpose
end
private
def extract_rows(string)
string.each_line.map { |line| line.split.map(&:to_i) }
end
end
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (15 by maintainers)
Let’s try to wrap this conversation up.
My conclusion from the discussion is that the Matrix solution with everything defined in the initializer, is maybe not per se wrong in this particular exercise, but approving an approach that can have repercussions in other situations is misleading for the student.
Matrix is a Level 2 exercise, and the discussion that is provoked by computing the rows and the columns in the initializer, is not appropriate for this level. There are other problems with this exercise as well. For instance, Ruby already has a Matrix class, and we don’t want to ask students to reimplement existing features (at least, not in core exercises).
All of the above leads to the conclusion that Matrix should be removed from the core progression. On the other hand, it offers some features that we badly need on Level 2:
mapwith Symbol#to_procattr_readermemoizationAt the moment, there are no other exercises (that I know of) that can fill the gap if we remove Matrix from the core. Also, Matrix is one of only 2 exercises on Level 2 (Series being the other one, following Matrix). Therefore, I propose to leave the things as they are. And try to find (or create) one or more other exercises that can fill the gap that is currently filled by Matrix. Albeit in a controversial way.
Given all this, can you, @yyyc514, or anyone else not live with this solution temporarily? If not, please explain why.
Also, suggestions for other exercises that can replace Matrix are very welcome.
TL;DR The current Matrix exercise is problematic in multiple ways. Let’s find a replacement for the Matrix exercise as a core asap. In the meantime, approve both approaches: either calculating rows and columns in the initializer, or defining separate methods.
I don’t think this is a great solution because it doesn’t address the performance problem. One can easily imagine a matrix with a variety of methods and I just want to load a bunch of matrixes from a text file and then print out only the LARGE ones. Imagine we had a slightly more parseable format (I just realized it’d have to use newlines different, but you still get the idea). IE:
Now imagine I have a 100 million matrixes on disk. I want to write code like this:
Or you could easily imagine writing it to use less RAM… the point is if there is only a single matrix that is 100x100… other than the disk IO and the cost of
widthandheight(I presume I’d have to parse the first part of the raw string) I can mostly ignore the other 99 million matrixes. They don’t need to be fully parsed, rows and columns don’t need to be computed at all.This is of course lazy loading (and not a new concept), but I think you get this for “free” when you keep
initsmall and push responsibilities down into their own methods.In the realm of Exercism exercises I can think of OTTOMH I think
initshould:In general if 80% of your class’s work is happening inside the
initializeryou’re doing it all wrong. (ie, this Matrix example)My background in Rails probably betrays me a little here, but I think even in Ruby proper these aren’t bad principals. Objects should be fast to create because there is great utility in working with objects as abstractions… and if you create slow to initialize objects then what you do is end up pushing logic that really belongs inside the object OUTSIDE the object so you can “work with the object” BEFORE creating the object - because the cost of creation is too high - and that path is the path to the dark side.
@pgaspar wrote:
This seems like another concern tangentially related to the internal representation.
And it also seems like a very interesting point of concern to raise, since Matrix is a core exercise.
Both because you say it’s not preferrable to mentor, and because it may simply be unclear.
@yyyc514 wrote:
OK, let me try and phrase it differently: There’s a premature coupling between any serialization format and object initialization: in this case a string from a text file, which could be some other string format, or a maybe even a filename from which to open and read the matrix, or an open file descriptor to such a file, or a network file descriptor, or a callback function.
Any input to a constructor that does not correspond to the internal, abstract representation of the class creates a coupling between instantiating objects and deserializing them. If the deserialization function is separate from the constructor, you can construct objects abstractly.
This premature coupling violates, as you said yourself, the Single-Responsibility Principle because the constructor does not construct objects, but prepares them for construction, and the getters don’t project objects, but deserializes them to some extent, and then projects them. It makes testing difficult, because you need to serialize a matrix before you can initialize it and test its properties.
I think your approach is ingenious in a highly specific use-case.
OK, then I misunderstood “bloated”.
I understand now that you only speak of avoiding computation.
This is certainly a necessary technique sometimes. For example, when sending out mass emails at my day job, we cannot use our standard, abstract URL object, because its initialization and internal representation is simply too expensive for millions of URLs in a very short time period. So we (sadly) use string concatenation in those cases. In dynamic languages, abstraction does not come for free.
@F3PiX suggested:
OK; I can’t tell what concepts are supposed to be covered, so I’ll leave the conversation at this point and wish you luck. 😃
I only skimmed, but not sure I follow this point. What I proposed doesn’t require a “specific serialization format”… you could imagine many potential formats… and still delaying the parsing until the actual underlying data was actually requested - not just when the object was instantiated. I only gave a specific encoding format example to help explain how you could have fast access to some attributes of the matrix but not others.
I didn’t realize I was treating them as a single concern. The point is about avoiding work we don’t need to do - regardless of what the internal representation looks like.
These are also potential possibilities for rough idea of size, but I presume the best way to know the EXACT size of the matrix is to ask it itself - not try to guess based on other factors. So yes, you’d instantiate it and ask it directly. And you could imagine designs where that required parsing the whole encoded format - but you could also imagine designs where just fetching the metadata (width, height, etc) was a very fast operation… and in those cases filtering by meta-data would be very efficient as I suggested. UNLESS it got bogged down by an
initthat was doing way more than absolutely necessary.I’m going to borrow this or some variant. 😃
I’ll try to sell this: There’s a general object-oriented idiom that teaches that constructors shouldn’t do any computational work themselves. An example of this argument is found on Misko Hevery’s Testability Explorer Blog: Flaw: Constructor does Real Work.
This conflicts with the fact that a matrix is given as a string, but the internal representation is column/row-based, which is also sensical. @yyyc514 argues that it’s not the responsibility of the constructor to perform the conversion. On the other hand, it’s inefficient if the object doesn’t cache its representation of a matrix in a way that is efficient for its methods.
One solution is to have a static constructor (factory) called something like
from_stringthat does the work ofextract_...such that the real constructor only does assignment. I don’t know what Ruby-specific naming idioms apply for static constructors / factories. They’re quite common in Java and Haskell.A solution might then look like:
A secondary concern is whether to cache/memoize the column representation or compute it every time it’s returned. And as @F3PiX voices, it depends. I’d elaborate that it depends not just on how you see the class, but how you expect to manipulate its objects. For example, if you expect the matrix to be mutable and that you want to change the values of cells, having the same cell in two places will become a problem. As the exercise is contrived in the sense that it only has
rowsandcolumnsproperties, and no actual use-cases, there’s no right answer here.It may be “preformance” optimization to memoize
columns, or it may not.So another solution might look like:
Either way, the concern of the constructor that does too much work is still valid.
If we keep the two concerns separate, perhaps we can reach a consensus faster. 😃
It’s hard for me to confirm or deny this.
But we should have plenty of rubyists on the track available?
Since there’s already a PR for this, exercism/website-copy#1051, perhaps this PR should just be updated with the changes around which this issue seeks to reach consensus?