Skip to content
This repository was archived by the owner on Sep 6, 2021. It is now read-only.

improvements to file system events processing for search #12885

Merged
merged 4 commits into from
Nov 17, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 137 additions & 61 deletions src/search/FindInFiles.js
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,24 @@ define(function (require, exports, module) {
var searchModel = new SearchModel();

/* Forward declarations */
var _documentChangeHandler, _fileSystemChangeHandler, _fileNameChangeHandler, clearSearch;
var _documentChangeHandler,
_fileSystemChangeHandler,
_processCachedFileSystemEvents,
_debouncedFileSystemChangeHandler,
_fileNameChangeHandler,
clearSearch;

/**
* Waits for FS changes to stack up until processing them
* (scripts like npm install can do a lot of movements on the disk)
* @const
*/
var FILE_SYSTEM_EVENT_DEBOUNCE_TIME = 100;

/** Remove the listeners that were tracking potential search result changes */
function _removeListeners() {
DocumentModule.off("documentChange", _documentChangeHandler);
FileSystem.off("change", _fileSystemChangeHandler);
FileSystem.off("change", _debouncedFileSystemChangeHandler);
DocumentManager.off("fileNameChange", _fileNameChangeHandler);
}

Expand All @@ -88,7 +100,7 @@ define(function (require, exports, module) {
_removeListeners();

DocumentModule.on("documentChange", _documentChangeHandler);
FileSystem.on("change", _fileSystemChangeHandler);
FileSystem.on("change", _debouncedFileSystemChangeHandler);
DocumentManager.on("fileNameChange", _fileNameChangeHandler);
}

Expand Down Expand Up @@ -658,7 +670,7 @@ define(function (require, exports, module) {
* @param {array} fileList The list of files that changed.
*/
function filesChanged(fileList) {
if (FindUtils.isNodeSearchDisabled() || fileList.length === 0) {
if (FindUtils.isNodeSearchDisabled() || !fileList || fileList.length === 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!fileList covers fileList.length === 0 too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it doesn't

var emptyArray = []; var a = !emptyArray; var b = emptyArray.length === 0; [a,b];
[false, true]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, sorry about that @zaggino, I automatically assumed that fileList would never be an empty array just undefined. Again, sorry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it as is, it's a very cheap check and better be safe than sorry :)

return;
}
var updateObject = {
Expand All @@ -676,7 +688,7 @@ define(function (require, exports, module) {
* @param {array} fileList The list of files that was removed.
*/
function filesRemoved(fileList) {
if (FindUtils.isNodeSearchDisabled()) {
if (FindUtils.isNodeSearchDisabled() || !fileList || fileList.length === 0) {
return;
}
var updateObject = {
Expand Down Expand Up @@ -732,69 +744,83 @@ define(function (require, exports, module) {

/*
* Remove existing search results that match the given entry's path
* @param {(File|Directory)} entry
* @param {Array.<(File|Directory)>} entries
*/
function _removeSearchResultsForEntry(entry) {
Object.keys(searchModel.results).forEach(function (fullPath) {
if (fullPath === entry.fullPath ||
(entry.isDirectory && fullPath.indexOf(entry.fullPath) === 0)) {
// node search : inform node that the file is removed
filesRemoved([fullPath]);
if (findOrReplaceInProgress) {
searchModel.removeResults(fullPath);
resultsChanged = true;
function _removeSearchResultsForEntries(entries) {
var fullPaths = [];
entries.forEach(function (entry) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth checking out if couple of Array.prototype.maps would be faster than nested forEachs and a push

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fullPaths.push(fullPath); is inside an if so if map would be used, we'd need to filter it out for nulls afterwards

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point 👍

Object.keys(searchModel.results).forEach(function (fullPath) {
if (fullPath === entry.fullPath ||
(entry.isDirectory && fullPath.indexOf(entry.fullPath) === 0)) {
// node search : inform node that the file is removed
fullPaths.push(fullPath);
if (findOrReplaceInProgress) {
searchModel.removeResults(fullPath);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called once with a large array too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeResults is a classic sync method, so it wouldn't do any benefit here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zaggino@ Bro how are you today? Please Bro help us we want CoffeeScript in Brakets and also
Babel , and many's other please

resultsChanged = true;
}
}
}
});
});
// this should be called once with a large array instead of numerous calls with single items
filesRemoved(fullPaths);
}

/*
* Add new search results for this entry and all of its children
* @param {(File|Directory)} entry
* Add new search results for these entries and all of its children
* @param {Array.<(File|Directory)>} entries
* @return {jQuery.Promise} Resolves when the results have been added
*/
function _addSearchResultsForEntry(entry) {
var addedFiles = [],
addedFilePaths = [],
deferred = new $.Deferred();

// gather up added files
var visitor = function (child) {
// Replicate filtering that getAllFiles() does
if (ProjectManager.shouldShow(child)) {
if (child.isFile && _isReadableText(child.name)) {
// Re-check the filtering that the initial search applied
if (_inSearchScope(child)) {
addedFiles.push(child);
addedFilePaths.push(child.fullPath);
function _addSearchResultsForEntries(entries) {
var fullPaths = [];
Copy link
Collaborator

@ficristo ficristo Nov 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a rough idea of the memory usage here? If we store a lot of path could cause some trouble?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this array is not being persisted besides the function execution lifetime and I wouldn't expect there to be a problem even if you have thousands of entries in it. On the other hand, without it, trying to send thousands of messages to a node process will definitely crash it (or socket server). In other words, I don't think memory usage is something to worry about.

return Async.doInParallel(entries, function (entry) {
var addedFiles = [],
addedFilePaths = [],
deferred = new $.Deferred();

// gather up added files
var visitor = function (child) {
// Replicate filtering that getAllFiles() does
if (ProjectManager.shouldShow(child)) {
if (child.isFile && _isReadableText(child.name)) {
// Re-check the filtering that the initial search applied
if (_inSearchScope(child)) {
addedFiles.push(child);
addedFilePaths.push(child.fullPath);
}
}
return true;
}
return true;
}
return false;
};
return false;
};

entry.visit(visitor, function (err) {
if (err) {
deferred.reject(err);
return;
}
entry.visit(visitor, function (err) {
if (err) {
deferred.reject(err);
return;
}

//node Search : inform node about the file changes
filesChanged(addedFilePaths);
//node Search : inform node about the file changes
//filesChanged(addedFilePaths);
fullPaths = fullPaths.concat(addedFilePaths);

if (findOrReplaceInProgress) {
// find additional matches in all added files
Async.doInParallel(addedFiles, function (file) {
return _doSearchInOneFile(file)
.done(function (foundMatches) {
resultsChanged = resultsChanged || foundMatches;
});
}).always(deferred.resolve);
}
});
if (findOrReplaceInProgress) {
// find additional matches in all added files
Async.doInParallel(addedFiles, function (file) {
return _doSearchInOneFile(file)
.done(function (foundMatches) {
resultsChanged = resultsChanged || foundMatches;
});
}).always(deferred.resolve);
} else {
deferred.resolve();
}
});

return deferred.promise();
return deferred.promise();
}).always(function () {
// this should be called once with a large array instead of numerous calls with single items
filesChanged(fullPaths);
});
}

if (!entry) {
Expand All @@ -804,23 +830,23 @@ define(function (require, exports, module) {

var addPromise;
if (entry.isDirectory) {
if (!added || !removed || (added.length === 0 && removed.length === 0)) {
if (added.length === 0 && removed.length === 0) {
// If the added or removed sets are null, must redo the search for the entire subtree - we
// don't know which child files/folders may have been added or removed.
_removeSearchResultsForEntry(entry);
_removeSearchResultsForEntries([ entry ]);

var deferred = $.Deferred();
addPromise = deferred.promise();
entry.getContents(function (err, entries) {
Async.doInParallel(entries, _addSearchResultsForEntry).always(deferred.resolve);
_addSearchResultsForEntries(entries).always(deferred.resolve);
});
} else {
removed.forEach(_removeSearchResultsForEntry);
addPromise = Async.doInParallel(added, _addSearchResultsForEntry);
_removeSearchResultsForEntries(removed);
addPromise = _addSearchResultsForEntries(added);
}
} else { // entry.isFile
_removeSearchResultsForEntry(entry);
addPromise = _addSearchResultsForEntry(entry);
_removeSearchResultsForEntries([ entry ]);
addPromise = _addSearchResultsForEntries([ entry ]);
}

addPromise.always(function () {
Expand All @@ -830,6 +856,56 @@ define(function (require, exports, module) {
}
});
};

/**
* This stores file system events emitted by watchers that were not yet processed
*/
var _cachedFileSystemEvents = [];

/**
* Debounced function to process emitted file system events
* for cases when there's a lot of fs events emitted in a very short period of time
*/
_processCachedFileSystemEvents = _.debounce(function () {
// we need to reduce _cachedFileSystemEvents not to contain duplicates!
_cachedFileSystemEvents = _cachedFileSystemEvents.reduce(function (result, obj) {
var fullPath = obj.entry ? obj.entry.fullPath : null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fullPath is null maybe return immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deeper in the code there's a handle for this case (doing a full refresh I think) so it didn't want to stop it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the assignment below is safe: result[fullPath] = obj; ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it actuall assigns result[null] = obj; which is a bit weird but valid so I left it like that.

// merge added & removed
if (result[fullPath] && obj.isDirectory) {
obj.added = obj.added.concat(result[fullPath].added);
obj.removed = obj.removed.concat(result[fullPath].removed);
}
// use the latest event as base
result[fullPath] = obj;
return result;
}, {});
_.forEach(_cachedFileSystemEvents, function (obj) {
_fileSystemChangeHandler(obj.event, obj.entry, obj.added, obj.removed);
});
_cachedFileSystemEvents = [];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are accessing the same global, and this function is called with a delay, don't we risk here to delete the events that have been added in the bouncedFileSystemChangeHandler function while waiting this to run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly? There's a _.forEach above this that launches _fileSystemChangeHandler for every object in the array, before deleting it. The function is completely synchronous - if an fs event is emitted while this is running, it'll be added to the empty array once this function finishes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I was overthinking, but I was wondering if there could be some races between _debouncedFileSystemChangeHandler and _processCachedFileSystemEvents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, races certainly can't happen in this case. They are both completely sync functions - no promises or async-await there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debounce and relative timout are threwing me off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debounce just wraps a function and keeps postponing its execution until the wrapper stops to be called for the timeout time. But once the wrapped function is executed, all happens totally sync so the code is safe. I've used this pattern a few times before.

}, FILE_SYSTEM_EVENT_DEBOUNCE_TIME);

/**
* Wrapper function for _fileSystemChangeHandler which handles all incoming fs events
* putting them to cache and executing a debounced function
*/
_debouncedFileSystemChangeHandler = function (event, entry, added, removed) {
// normalize this here so we don't need to handle null later
var isDirectory = false;
if (entry && entry.isDirectory) {
isDirectory = true;
added = added || [];
removed = removed || [];
}
_cachedFileSystemEvents.push({
event: event,
entry: entry,
isDirectory: isDirectory,
added: added,
removed: removed
});
_processCachedFileSystemEvents();
};

/**
* On project change, inform node about the new list of files that needs to be crawled.
Expand Down