-
Notifications
You must be signed in to change notification settings - Fork 15
Move code from src-d/ml #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: guillemdb <g.duran@instadeep.com>
Signed-off-by: guillemdb <guillem@sourced.tech>
Signed-off-by: guillemdb <guillem@sourced.tech>
|
Well, as soon as it is almost the same files we have in src-d/ml can you be more specific how do you change it if you did. What I should pay more attention to? |
|
The files are the same, the only stuff I did was:
The functionality has not been changes, I just changed the file locations. |
zurk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Guillemdb for great work!
I suggest keeping in mind one general point while we splitting sourced-ml package.
Everything that we commit to ml-core should work, be related and consistent.
I do not mean code, but the rest: links, descriptions, etc. Otherwise, it will be harder to maintain. And we will deal with remains of tails from the deprecated package. I do not suggest to adopt it in the PR but rather just exclude it for now until we need this part (see my suggestion about readme for example). I hope I spotted most of the places with old and unrelated things. Please check if there is something else.
There is a list of things I was not able to comment because they are in the initial commit and not in the PR:
- Name of the project in
contributing.mdshould be changed. - Maybe we should update a year from license to 2019. I think Marcelo knows better, please ask him.
- Maintainers the last bracket is not escaped should be
\)instead of ')' - No doc directory means no
SUMMARY.md. let's delete this file.
And two more general things.
- package name. I think
core_mlin not the best choice because usually python packages do not have an underscore in its name. I think what we should do in to havesourced.ml.corepackage and share a namespace withsourced.ml.mining. I suggest usingsourcedprefix to avoid possible overlapping with othersmlpackages. (@vmarkovtsev please approve). - I do not see
.travis.ymlfile. But tests are included. Do you plan to add it later?
|
I pushed a new commit with the requested changes and the .travis.yml file. I corrected the flake8 issues in another branch, if you allow me to ignore them in the next PR they will be fixed. In the Btw, @marnovo told me that there is no consensus on the year of the license, so I didn't make any change to it. |
Signed-off-by: Guillem Duran <guillem@sourced.tech>
Signed-off-by: Guillem Duran <guillem@sourced.tech>
Signed-off-by: Guillem Duran <guillem@sourced.tech>
Signed-off-by: Guillem Duran <guillem@sourced.tech>
Signed-off-by: Guillem Duran <guillem@sourced.tech>
|
The build is failing because I remove the references to bblfsh on the MAKEFILE. Should I add it again so the build passes? |
Signed-off-by: Guillem Duran <guillem@sourced.tech>
|
oh, sure. In this case, we should keep it. |
Signed-off-by: Guillem Duran <guillem@sourced.tech>
Signed-off-by: Guillem Duran <guillem@sourced.tech>
Signed-off-by: Guillem Duran <guillem@sourced.tech>
Signed-off-by: Guillem Duran <guillem@sourced.tech>
|
When I run the tests on my local repository I find that only one of them is failing: ======================================================================
FAIL: test_result (ml_core.tests.test_uast_to_id_distance.Uast2IdLineDistanceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/guillem/srcd/ml-core/ml_core/tests/test_uast_to_id_distance.py", line 141, in test_result
self.assertEqual(res, correct)
AssertionError: Lists differ: [(('__spec__', 'ModuleSpec'), 0), (('__spec__',[1679 chars], 1)] != [(('__package__', 'ModuleSpec'), 2), (('__spec_[2142 chars], 1)]
First differing element 0:
(('__spec__', 'ModuleSpec'), 0)
(('__package__', 'ModuleSpec'), 2)
Second list contains 14 additional elements.
First extra element 55:
(('utmain', 'collections'), 2)
+ [(('__package__', 'ModuleSpec'), 2),
- [(('__spec__', 'ModuleSpec'), 0),
? ^
+ (('__spec__', 'ModuleSpec'), 0),
? ^
(('__spec__', 'ModuleSpec'), 1),
(('__spec__', 'ModuleSpec'), 1),
+ (('__spec__', 'ModuleSpec'), 2),
(('__spec__', '__package__'), 0),
+ (('collections', 'ModuleSpec'), 1),
(('collections', 'ModuleSpec'), 2),
(('collections', '__package__'), 1),
(('collections', '__spec__'), 1),
+ (('collections', '__spec__'), 2),
(('modules', '__package__'), 1),
(('modules', '__spec__'), 1),
(('modules', 'collections'), 2),
(('namedtuple', 'ModuleSpec'), 0),
(('namedtuple', 'ModuleSpec'), 1),
+ (('namedtuple', 'ModuleSpec'), 1),
(('namedtuple', 'ModuleSpec'), 2),
(('namedtuple', 'ModuleSpec'), 2),
(('namedtuple', '__package__'), 1),
+ (('namedtuple', '__package__'), 2),
(('namedtuple', '__spec__'), 1),
(('namedtuple', '__spec__'), 1),
+ (('namedtuple', '__spec__'), 2),
+ (('namedtuple', '__spec__'), 2),
(('namedtuple', 'collections'), 0),
- (('namedtuple', 'collections'), 2),
? ^
+ (('namedtuple', 'collections'), 1),
? ^
(('namedtuple', 'modules'), 2),
(('setup_logging', 'modelforge.logs'), 0),
(('setup_logging', 'setup'), 1),
(('sys', '__package__'), 1),
(('sys', '__spec__'), 1),
(('sys', 'collections'), 2),
(('sys', 'modelforge.logs'), 2),
(('sys', 'modules'), 0),
(('sys', 'namedtuple'), 2),
(('sys', 'setup_logging'), 2),
(('utmain', 'ModuleSpec'), 0),
(('utmain', 'ModuleSpec'), 1),
(('utmain', 'ModuleSpec'), 1),
(('utmain', 'ModuleSpec'), 1),
(('utmain', 'ModuleSpec'), 2),
+ (('utmain', 'ModuleSpec'), 2),
+ (('utmain', 'ModuleSpec'), 2),
(('utmain', '__package__'), 0),
(('utmain', '__package__'), 0),
(('utmain', '__package__'), 1),
(('utmain', '__spec__'), 0),
(('utmain', '__spec__'), 0),
(('utmain', '__spec__'), 0),
(('utmain', '__spec__'), 1),
(('utmain', '__spec__'), 2),
(('utmain', 'collections'), 1),
(('utmain', 'collections'), 1),
(('utmain', 'collections'), 2),
+ (('utmain', 'collections'), 2),
(('utmain', 'modules'), 0),
(('utmain', 'modules'), 1),
(('utmain', 'modules'), 1),
(('utmain', 'namedtuple'), 1),
(('utmain', 'namedtuple'), 1),
(('utmain', 'namedtuple'), 1),
(('utmain', 'namedtuple'), 2),
+ (('utmain', 'namedtuple'), 2),
+ (('utmain', 'namedtuple'), 2),
+ (('utmain', 'namedtuple'), 2),
(('utmain', 'sys'), 0),
(('utmain', 'sys'), 1),
(('utmain', 'sys'), 1)]Does the file However, during the CI, a lot of tests fail with a similar error to this one> ERROR: test_greatest2 (ml_core.tests.test_df.DocumentFrequenciesTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/ml_core/ml_core/tests/test_df.py", line 10, in setUp
self.model = DocumentFrequencies().load(source=paths.DOCFREQ)
File "/usr/local/lib/python3.6/dist-packages/modelforge/model.py", line 93, in load
cache_dir = os.path.join(self.cache_dir(), self.NAME)
File "/usr/local/lib/python3.6/dist-packages/modelforge/model.py", line 338, in cache_dir
raise RuntimeError("modelforge is not configured; look at modelforge.configuration. "
RuntimeError: modelforge is not configured; look at modelforge.configuration. Depending on your objective you may or may not want to create a modelforgecfg.py file which sets VENDOR and the rest.The I am creating an additional copy of |
|
I will check tomorrow. |
Signed-off-by: Konstantin Slavnov <konstantin@sourced.tech>
|
I add my changes to the commit. Now everything works fine.
|
|
things that should be addressed in the next PRs (@Guillemdb you can convert this list to issues after we merge this PR):
|
Signed-off-by: Guillem Duran <guillem@sourced.tech>
|
I finally managed to fix the CI and upload the following changes:
======================================================================
ERROR: test_serialize (ml.core.tests.models_tensorflow.test_tensorflow.TensorFlowModelTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/sourced/ml/core/tests/models_tensorflow/test_tensorflow.py", line 18, in test_serialize
TensorFlowModel().construct(graphdef=gd).save(buffer)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/modelforge/model.py", line 390, in save
raise ValueError("series must be specified")
ValueError: series must be specified |
Ok. One thing, you should add not a pre-commit hook but another style check (see #1 (comment)). I am fine if you do it in the next PR. In the best case, black check should completely replace flake8. |
Signed-off-by: Konstantin Slavnov <konstantin@sourced.tech>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only one minor thing.
|
@Guillemdb @vmarkovtsev It would be desirable to preserve history or, at least, separate commits for code import and code changes, so it's clear the point at which it's imported and what changes were made. Please, ping me on Slack if there's any doubt about how to do this. |
|
So far there is no changes to the code functionality besides fixing the Tensorflow test, I was waiting for this PR to be merged to make any change to the core-ml functionality if needed. |
zurk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait for @vmarkovtsev input also.
|
@Guillemdb But there were non-functional changes, right? Or just things like imports? |
|
Yeah, all the changed were non-functional, what I did was:
|
|
@Guillemdb It is impossible for me to review 118 files. I will comment and the thing will turn into an unmanageable mess. Can you please split all the changes into feature-full 5-6 PRs and submit them one by one. It is fine that each following PR depends on the previous ones.
|
This is the first stage of the refactor of
src-d/mlintoml-core.