I have problem that is reducible to the following:
From a collection of stacks, find all items whose "keys" are on all stacks.
My current solution to this problem is to just pop things off as quickly as possible store the items in the language's set type, compute the intersection every once in a while (the stacks are refilling constantly), and use that set, repeat.
The issue here is that the equivalent of the pop operation is expensive (I can do this at most 4k times a second), and it kills the chance of multithreading (as pops are destructive).
Is there a way of doing this such that I can multithread the intersection bit? I cannot practically use shared memory or requeuing items. The threads can communicate with each other via pipes/sockets if need be, but such communication should be kept to a minimum (not really interesting in starting up some whole separate client/server thing just for this part of my application)
As an idea of what I am dealing with, there can be up to 15 million items on the queues at any one time. By the time its "over" all items will have a match.
Ideas?
Clarification: These are message stacks (well queues really but the same thing) over the network. A pop is a consume operation over network, and push is a publish. Thats why its so slow. Since the issues is one of latency rather than bandwidth, multithreading can potentially help. I am not CPU bound. I have ~10 stacks, and up to 10 million items at a time. I have no control over the implementation of the stacks.
Apparently this is a distributed systems problem. You have ~ 10 stacks, each with up to 10 million items at a time, and each stack living on a different machine.
Here is one simple candidate solution. Have a separate "monitor" machine, whose sole purpose is to continually compute the intersection. Any time one of the stacks is modified, the machine storing that stack should send a message to the monitor message describing the change. (These diffs can be batched, depending upon your latency requirements.)
Now you have all the data on a single machine, the monitor machine. When running on a single machine, you can completely avoid all of the concurrency issues associated with multithreading or distributed systems -- e.g., the need for synchronization, locks, etc.
Moreover, the amount of data is small enough that it could easily be stored in the available RAM on that monitor machine. (Why? If each entry is large, hash it first using a good hash function and a large enough hash output that you won't see collisions. If each hash value is 128 bits, then storing all 100 million hashed items takes 1.6GB. If you use SHA256 truncated to 128 bits as your hash function, by the birthday paradox, the chances of encountering a hash collision are incredibly small.)
Once all of the data is living in RAM on a single machine, the problem becomes much easier. For example, one approach would be to build an index data structure: a hashmap that maps each key to a 10-bit bitmap that indicates which of the stacks it is stored on. You can easily update this hashmap as the individual stacks change.
You can also easily use this hashmap to compute the intersection: just have a doubly linked list threaded through the entries of this hashmap where the bitmap is equal to 11111111111 (all 1's). Any time a bitmap changes from 11111111111 to something else, you remove it from the doubly linked list. Any time a bitmap changes from something else to 11111111111, you insert it into the doubly linked list.
The amount of data you need to transfer from other machines to the monitor machine is very small, especially since you only need to send the hashed items to the monitor, not the original items. If each of the 10 machines will do at most 4k push/pop's per second (taken from your question), then that's 64 KB of hashed data per second per machine. If we add another 32 bytes or so of overhead (packet headers), that's about 200 KB/s of data out of each machine, or about 2 MB/s of data into the monitor machine -- a very manageable amount. On a 1 Gbps Ethernet link you won't notice this, and you might not notice it even on a 100 Mbps link. You can reduce the amount of traffic by a factor of 2-5x by batching such messages and/or by using a shorter hash.
This should provide an efficient algorithm to compute the intersection, and keep it updated on the fly as each of the stacks change.
I have problem that is reducible to the following: From a collection of stacks, find all...
help finish Queue, don't think
I have the right thing.
# 1. After studying the Stack class and testStack() functions in stack.py # complete the Queue class below (and test it with the testQueue function) # # 2. Afer studying and testing the Circle class in circle.py, # complete the Rectangle class below (and test it with the testRectangle function) # # # 3. SUBMIT THIS ONE FILE, with your updates, TO ICON. # # # NOTE: you may certainly...
My whole life I have definitely been a type A person. I have always been goal-oriented and very driven to succeed. I don’t think that is necessarily a bad thing, but it can definitely lead to stress. In college I had a very demanding engineering major. I also worked part time. Between homework and work I was extremely busy, so I would be very upset if anything got in the way of what I needed to do. Traffic jams would...
I have an Assignment of Marketing Research on Climate Change. Where i have to take an interview of an industry professional, which is done. I have all answer what he said, now i just need to analysis all answer into sub category which are as follow: - 1. Level of Concern of Professionals 2. Impacts on Industry 3. Awareness of Millennials' Knowledge 4. Attitudes Among Millennials If you think any other category could be included please add it. i have...
You just passed the bar and are working at a small firm in Temple, TX when a new client leaves the following voice mail that was forwarded to you from the senior partners: "Sir, I am going to be starting a business and wanted to get some advice about getting started. I have heard these ads on the radio about Legal Zoom for incorporating a business and wonder whether I should do anything special, like incorporating, or have to do...
I would like help, I am unsure how to solve the following questions from my study guide. I am no confused because I have to use an excel sheet upon solving them 1. The current cost function for a lab that evaluates blood tests is C=400,000 + 20Q, where Q is the number of tests performed annually. If the lab expects to perform 50,000 tests annually, what are the average costs per evaluation? 2. A clinic finds that it can...
Explain Autonomy (Task, Time, Team, and Technique) that you have at your present job. What would you change? Based on book, Drive by Daniel Pink (but don't need book to answer Q) - Read below or explanation and definitions " Pink argues that for a person to be motivated completely, he or she must have autonomy over four things in his or her work environment: Task, Time, Technique, and Team. without reinventing the wheel, allow me to explain (you should...
I have no idea about how to write a balance sheet at
all. thank you!
Take the following information on a company, and say what assets the company has and where the money came from to own those assets (its liabilities and equity) at the end of two years of operation (assume the principal repayment of debt has been made for year 2): The company has gross sales of $48 million per year, and the pattern of sales is even,...
I have posted the assignment below and I know what the final numbers should be which are CF year 0 = -84,000 CF year 1= 17,800 CF year 5 = 72,475 NPV = 10,942 I am having a really hard time with how she is coming up with these numbers Gustav is facing a new financial decision. His old grinding and blending equipment is showing its age and requires more frequent shutdowns. He consulted with the local maintenance team, and...
Breaking it Down Problem One problem that I find in new Java programmers is a tendency to put too much code in "main()". It is important to learn how to break a program down into methods and classes. To give you practice at this, I have a program that has too much code in main. Your job will be to restructure the code into appropriate methods to make it more readable. I will provide you with lots of coaching on...
Topic: Business Opportunities for Apple So the company that I have been researching is Apple. We all know that Apple has gotten a lot pricier over the years, and their reputation has kind of taken a hit due to it. Apple also isn't known for being the most charitable or a business that really focuses on helping the community around them. I think a great investment idea for Apple would be on that gives back to the communities around their...