MapReduce Joins not outputting correct result
I am trying to implement a join in MapReduce between two different files. Both files contain details of bitcoin transactions. The first file has the following fields contained:hash, value, n, publickey
The second file which I want to join it too has the following fields: txid, tx_hash, vout
.
What I want to do is perform a replication join between the hash field from the first file and the txid field from the second file. These two fields are referring to the same value so the join should find transactions in the second file which are also referenced in the first file. Here is the code I have written so far:
from mrjob.job import MRJob
from mrjob.step import MRStep
class repl_stock_join(MRJob):
sector_table = {}
def mapper_join_init(self):
with open("filtertemp.csv") as f:
for line in f:
fields = line.split(',')
key = fields[0]
val = "hash"
self.sector_table[key] = val
def mapper_repl_join(self, _, line):
try:
fields = line.split(',')
txid = fields[0]
if txid in self.sector_table:
print(txid)
yield(txid, None)
except:
pass
def steps(self):
return [MRStep(mapper_init=self.mapper_join_init, mapper=self.mapper_repl_join)]
if __name__ == '__main__':
repl_stock_join.run()
However, when running this code, nothing is returned but an empty text file which shouldn't be the case. Any help to fix this is appreciated.
python hadoop mapreduce bigdata mrjob
add a comment |
I am trying to implement a join in MapReduce between two different files. Both files contain details of bitcoin transactions. The first file has the following fields contained:hash, value, n, publickey
The second file which I want to join it too has the following fields: txid, tx_hash, vout
.
What I want to do is perform a replication join between the hash field from the first file and the txid field from the second file. These two fields are referring to the same value so the join should find transactions in the second file which are also referenced in the first file. Here is the code I have written so far:
from mrjob.job import MRJob
from mrjob.step import MRStep
class repl_stock_join(MRJob):
sector_table = {}
def mapper_join_init(self):
with open("filtertemp.csv") as f:
for line in f:
fields = line.split(',')
key = fields[0]
val = "hash"
self.sector_table[key] = val
def mapper_repl_join(self, _, line):
try:
fields = line.split(',')
txid = fields[0]
if txid in self.sector_table:
print(txid)
yield(txid, None)
except:
pass
def steps(self):
return [MRStep(mapper_init=self.mapper_join_init, mapper=self.mapper_repl_join)]
if __name__ == '__main__':
repl_stock_join.run()
However, when running this code, nothing is returned but an empty text file which shouldn't be the case. Any help to fix this is appreciated.
python hadoop mapreduce bigdata mrjob
add a comment |
I am trying to implement a join in MapReduce between two different files. Both files contain details of bitcoin transactions. The first file has the following fields contained:hash, value, n, publickey
The second file which I want to join it too has the following fields: txid, tx_hash, vout
.
What I want to do is perform a replication join between the hash field from the first file and the txid field from the second file. These two fields are referring to the same value so the join should find transactions in the second file which are also referenced in the first file. Here is the code I have written so far:
from mrjob.job import MRJob
from mrjob.step import MRStep
class repl_stock_join(MRJob):
sector_table = {}
def mapper_join_init(self):
with open("filtertemp.csv") as f:
for line in f:
fields = line.split(',')
key = fields[0]
val = "hash"
self.sector_table[key] = val
def mapper_repl_join(self, _, line):
try:
fields = line.split(',')
txid = fields[0]
if txid in self.sector_table:
print(txid)
yield(txid, None)
except:
pass
def steps(self):
return [MRStep(mapper_init=self.mapper_join_init, mapper=self.mapper_repl_join)]
if __name__ == '__main__':
repl_stock_join.run()
However, when running this code, nothing is returned but an empty text file which shouldn't be the case. Any help to fix this is appreciated.
python hadoop mapreduce bigdata mrjob
I am trying to implement a join in MapReduce between two different files. Both files contain details of bitcoin transactions. The first file has the following fields contained:hash, value, n, publickey
The second file which I want to join it too has the following fields: txid, tx_hash, vout
.
What I want to do is perform a replication join between the hash field from the first file and the txid field from the second file. These two fields are referring to the same value so the join should find transactions in the second file which are also referenced in the first file. Here is the code I have written so far:
from mrjob.job import MRJob
from mrjob.step import MRStep
class repl_stock_join(MRJob):
sector_table = {}
def mapper_join_init(self):
with open("filtertemp.csv") as f:
for line in f:
fields = line.split(',')
key = fields[0]
val = "hash"
self.sector_table[key] = val
def mapper_repl_join(self, _, line):
try:
fields = line.split(',')
txid = fields[0]
if txid in self.sector_table:
print(txid)
yield(txid, None)
except:
pass
def steps(self):
return [MRStep(mapper_init=self.mapper_join_init, mapper=self.mapper_repl_join)]
if __name__ == '__main__':
repl_stock_join.run()
However, when running this code, nothing is returned but an empty text file which shouldn't be the case. Any help to fix this is appreciated.
python hadoop mapreduce bigdata mrjob
python hadoop mapreduce bigdata mrjob
asked Nov 25 '18 at 23:13
ArsenalfanArsenalfan
135
135
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472957%2fmapreduce-joins-not-outputting-correct-result%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472957%2fmapreduce-joins-not-outputting-correct-result%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown