From a list of couples of rows to a flat list of rows - python

I'd like to find couples of items which have a price difference of less than 5$. It works with:
import sqlite3
db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('CREATE TABLE mytable (id integer, price integer, name text)')
NAMES = ['Item A', 'Item B', 'Item C', 'Item D', 'Item E', 'Item F']
PRICES = [100, 101, 102, 189, 190, 229]
for i in range(len(NAMES)):
c.execute('INSERT INTO mytable VALUES (?, ?, ?)', (i, PRICES[i], NAMES[i]))
c.execute('SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2 WHERE ABS(mt1.price - mt2.price) < 5 AND mt1.id < mt2.id')
for e in c.fetchall(): print e
(0, 100, u'Item A', 1, 101, u'Item B')
(0, 100, u'Item A', 2, 102, u'Item C')
(1, 101, u'Item B', 2, 102, u'Item C')
(3, 189, u'Item D', 4, 190, u'Item E')
How to get a flat list instead of a list of couples? i.e. :
(0, 100, u'Item A') # 1st item of couple #1
(1, 101, u'Item B') # 2nd item of couple #1
(0, 100, u'Item A') # 1st item of couple #2
(2, 102, u'Item C') # 2nd item of couple #2
(1, 101, u'Item B') # 1st item of couple #3
(2, 102, u'Item C') # 2nd item of couple #3
(3, 189, u'Item D') # 1st item of couple #4
(4, 190, u'Item E') # 2nd item of couple #4

You can do:
select mt.*
from mytable mt
where exists (select 1 from mytable mt2 where abs(mt.price - mt2.price) < 5 and mt.id <> mt2.id);
This does not arrange them in any particular order, though.
If you do want them in order, then perhaps unpivoting is the best option:
SELECT (CASE WHEN n = 1 THEN mt1.id ELSE mt2.id END) as id,
(CASE WHEN n = 1 THEN mt1.price ELSE mt2.price END) as price,
(CASE WHEN n = 1 THEN mt1.name ELSE mt2.name END) as name
FROM mytable mt1 JOIN
mytable mt2
ON ABS(mt1.price - mt2.price) < 5 AND mt1.id < mt2.id CROSS JOIN
(SELECT 1 as n UNION ALL SELECT 2) n
ORDER BY mt1.id, mt2.id;

Instead of globing all columns try specifying your column names:
SELECT mt1.id, mt1.price, mt1.name,
mt2.id, mt2.price, mt2.name,
FROM mytable mt1, mytable mt2
WHERE ABS(mt1.price - mt2.price) < 5
AND mt1.id < mt2.id

You can create another table to know which table mt1 or mt2 to get the values. This will also similar to your original query.
SELECT (CASE WHEN tbl.col='mt1' THEN mt1.id ELSE mt2.id END) as id,
(CASE WHEN tbl.col='mt1' THEN mt1.price ELSE mt2.price END) as price,
(CASE WHEN tbl.col='mt1' THEN mt1.name ELSE mt2.name END) as name
FROM mytable mt1, mytable mt2,
(SELECT 'mt1' as col
UNION SELECT 'mt2' ) tbl
WHERE ABS(mt1.price-mt2.price) < 5
AND mt1.id < mt2.id
ORDER BY mt1.id, mt2.id;

Related

How to find couples (or tuples) in a Sqlite query?

Let' say we have this dataset:
import sqlite3
db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('CREATE TABLE mytable (id integer, description text)')
c.execute('INSERT INTO mytable VALUES (2, "abc")')
c.execute('INSERT INTO mytable VALUES (5, "def")')
c.execute('INSERT INTO mytable VALUES (18, "geh")')
c.execute('INSERT INTO mytable VALUES (19, "ijk")')
c.execute('INSERT INTO mytable VALUES (27, "lmn")')
How to find pairs / couples of rows for which their ID is distant of at most 3? i.e. it should return the rows (2,5) and (18, 19), and maybe also (5,2), (19,18) (but these two last ones are not needed).
I tried:
c.execute('SELECT id as id1 FROM mytable, SELECT id as id2 FROM mytable WHERE abs(id1 - id2) <= 3')
but it does not seem to work:
sqlite3.OperationalError: near "SELECT": syntax error
Try this:
SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2 WHERE abs(mt1.id - mt2.id) <= 3 and mt1.id<mt2.id
Output:
[(2, u'abc', 5, u'def'), (18, u'geh', 19, u'ijk')]
The condition mt1.id<mt2.id is used to remove the duplicates.
SELECT t1.id, t2.id
FROM mytable t1
CROSS JOIN mytable t2
WHERE t1.id != t2.id AND abs(t1.id - t2.id) <= 3
Output:
[(2, 5), (5, 2), (18, 19), (19, 18)]

sqlfiddle & MySQL, not the same result with #variable

I try to figure out why MySQL doesn't get the same result than sqlfiddle, can someone help me to solve my request?
TABLE
CREATE TABLE `test` (
`id` int(11) UNIQUE,
`name` varchar(20),
`time` int(11),
`points` int(11),
`car` varchar(10),
`date`date
);
INSERT INTO `test` (`id`, `name`, `time`, `points`, `car`, `date`) VALUES
(1, 'Daniel', 55, 210, 'red', '2018-01-20'),
(2, 'Daniel', 45, 250, 'green', '2018-01-21'),
(3, 'Julie', 54, 220, 'red', '2018-01-19'),
(4, 'Julie', 33, 150, 'yellow', '2018-01-22');
REQUEST
SELECT *, #row_num, #prev_value,
#row_num := F(#prev_value=bla.name,#row_num+1,1) as RowNumber,
#prev_value:=bla.name
FROM (SELECT * FROM TEST ORDER BY name, TEST.POINTS DESC, TEST.TIME ASC) bla,
SELECT #row_num := 1) x
sqlfiddle result
MySQL result
Why #prev_value is always NULL ?
thank you
Nico

Can't get column names after first table return from Stored procedure in pymssql

I have a procedure in sql server that returns multiple tables:
create procedure newtest
as
begin
select 1 as a
select 2 as b
end
In python, cursor.description just returns first column name: a
I want to get every column name in each table.
How can I do that?
This is my code:
cur.execute(com)
num_tables=0
while True:
print(cur.description)
ret=cur.fetchall()
if len(ret)>0:
ret_list.append(ret)
num_tables+=1
print(ret)
else:
break
If the command returns multiple tables (i.e. multiple result sets). You can use Cursor.nextset() to switch from one set to the next one.
Something like :
num_tables = 0
while True:
print(cur.description)
# all lines of the current set
ret = cur.fetchall()
if len(ret) > 0:
ret_list.append(ret)
num_tables += 1
print(ret)
# check and fetch the next set
if not cur.nextset():
break
The result sets are not forced to have the same column count. For example with :
create procedure newtest
as
begin
select 1 as a
select 2 as b, 3 as c
end
The result is :
(('a', <class 'int'>, None, 10, 10, 0, False),)
[(1, )]
(('b', <class 'int'>, None, 10, 10, 0, False), ('c', <class 'int'>, None, 10, 10, 0, False))
[(2, 3)]

Convert query result types when using fetchall()

I am writing a Python database data backup script targeting an instance of Microsoft SQL Server 2012. My goal is to be able to easily backup a database's data to an existing empty schema. I am using pymssql as my database interface.
Here is the section of code I am having problems with:
def run_migrate(s_conn, d_conn):
source_cursor = s_conn.cursor()
dest_cursor = d_conn.cursor()
source_cursor.execute("SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES order by TABLE_NAME asc")
res_tables = source_cursor.fetchall()
tables = [i[0] for i in res_tables]
for i in range(len(tables)):
query_columns = "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '" + str(tables[i]) + "'"
l.log.debug(query_columns)
source_cursor.execute(query_columns)
res_columns = source_cursor.fetchall()
columns = [i[0] for i in res_columns]
query_data = 'SELECT * FROM ' + tables[i]
l.log.debug(query_data)
source_cursor.execute(query_data)
res_results = source_cursor.fetchall()
l.log.info('Extracted ' + str(len(results)) + ' results from ' + tables[i])
My problem is that when i use res_results = fetchall() I get back a list that looks like the following:
<class 'list'>: [(Decimal('102'), datetime.datetime(2016, 3, 29, 13, 53, 20, 593000), '1459281200561077E9152BF85FCBC33E6689E9DCB61705985137701459280466827', 'cable summary ', 3600000, 0, None, None, None, None, None, None, 0, 0, None, None, Decimal('333'), Decimal('100'), Decimal('107'), None, None)]
I am looking to form a parameterized INSERT statement out of this data to run against d_conn.execute() but how can I convert the data types on the fly without knowing beforehand each columns data type and how to handle it?
For example I would need the convert the list:
<class 'list'>: [
(Decimal('102'),
datetime.datetime(2016, 3, 29, 13, 53, 20, 593000),
'1459281200561077E9152BF85FCBC33E6689E9DCB61705985137701459280466827',
'cable summary ',
3600000,
0,
None,
None,
None,
None,
None,
None,
0,
0,
None,
None,
Decimal('333'),
Decimal('100'),
Decimal('107'),
None,
None)]
into the SQL INSERT statement:
INSERT INTO [ALERT] VALUES
(102
,'2016-03-29 13:53:20.593'
,1
,'cable summary'
,3600000
,0
,NULL
,NULL
,NULL
,NULL
,NULL
,NULL
,0
,0
,NULL
,NULL
,333
,100
,107
,NULL
,NULL)
I want to be able to do this with any datatypes I pass it as I have a few hundred tables to do this with and don't want to write a function for each one. Any help is appreciated.
For example I would need the convert the list ... into the [literal] SQL INSERT statement ...
Actually, no you wouldn't, at least not if you really did use a parameterized query for the INSERT. Python's DB-API specifies generic parameter placeholders and relies on the database access layer (e.g., pymssql) to figure out how to handle parameters based on the type of parameter values supplied.
So, you can simply take the list of rows (which contain values of the appropriate type) from the source table and use those rows as parameter values for an executemany on the target table, like so:
# set up test
create_stmt = """\
CREATE TABLE {} (
intCol INT PRIMARY KEY,
nvarcharCol NVARCHAR(50),
datetimeCol DATETIME,
decimalCol DECIMAL(18,4),
nullableCol INT
)
"""
crsr.execute(create_stmt.format("#MainTable"))
load_test_data = """\
INSERT INTO #MainTable (intCol, nvarcharCol, datetimeCol, decimalCol) VALUES
(1, N'test 1', '2016-01-01', 1.11),
(2, N'test 2', '2016-02-02', 2.22),
(3, N'test 3', '2016-03-03', 3.33)
"""
crsr.execute(load_test_data)
crsr.execute(create_stmt.format("#BackupTable"))
# perform the copy ...
read_stmt = """\
SELECT * FROM #MainTable
"""
crsr.execute(read_stmt)
source_rows = crsr.fetchall()
# ... using a true parameterized query ...
write_stmt = """\
INSERT INTO #BackupTable
(intCol, nvarcharCol, datetimeCol, decimalCol, nullableCol)
VALUES
(%s, %s, %s, %s, %s)
"""
# ... inserting all rows at once using list returned by fetchall() from source table
crsr.executemany(write_stmt, source_rows)
#check results
crsr.execute("SELECT * FROM #BackupTable")
print("Rows from #BackupTable:")
for row in crsr.fetchall():
print(row)
... producing:
Rows from #BackupTable:
(1, 'test 1', datetime.datetime(2016, 1, 1, 0, 0), Decimal('1.1100'), None)
(2, 'test 2', datetime.datetime(2016, 2, 2, 0, 0), Decimal('2.2200'), None)
(3, 'test 3', datetime.datetime(2016, 3, 3, 0, 0), Decimal('3.3300'), None)

Grouping, calculating, and sorting scoring data

I have a list of lists, each list having "row id", "team name", "team number", "scout", "score":
teams = [[23L, u'team1', 5713L, u'Gange', 144L],
[22L, u'team3', 1406L, u'Gange', 126L],
[15L, u'team2', 7319L, u'Bob Loblaw', 90L],
[17L, u'team2', 7319L, u'Gange', 54L],
[18L, u'team1', 5713L, u'Bob Loblaw', 69L],
[16L, u'team3', 1406L, u'Bob Loblaw', 113L]]
I want to first group the data by the "team number" value, then get the min/avg/max of the "score" value by team. I can get all this information individually with pandas by using these functions:
res = pd.DataFrame(teams)
res.columns = ['id', 'name', 'number', 'scout', 'score']
print res.groupby('number')['score'].min()
print res.groupby('number')['score'].mean()
print res.groupby('number')['score'].max()
number
406 0
5703 9
7129 18
Name: score, dtype: int64
number
406 9.0
5703 22.5
7129 27.0
Name: score, dtype: float64
number
406 18
5703 36
7129 36
Name: score, dtype: int64
My problem is I want to keep all the original columns except score, effectively collapsing the rows into a single row for each team and the score column replaced with a list/tuple for min, avg, max values from the rows that have the same team, but to output this to a python object I can pass to a form, which I'm not sure if pandas is the best module for this.
I've looked at some samples with itertools, pandas, numpy, etc, but I'm going in circles now not sure how to approach the problem. Thanks in advance for any advice.
Python comes with batteries included. You can use the power of SQLite from the sqlite3 module.
import sqlite3
teams = [[23L, u'team1', 5713L, u'Gange', 144L],
[22L, u'team3', 1406L, u'Gange', 126L],
[15L, u'team2', 7319L, u'Bob Loblaw', 90L],
[17L, u'team2', 7319L, u'Gange', 54L],
[18L, u'team1', 5713L, u'Bob Loblaw', 69L],
[16L, u'team3', 1406L, u'Bob Loblaw', 113L]]
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table t (id int, team_name text, team_number int, scout text, team_score int)");
cur.executemany("insert into t values(?, ?, ?, ?, ?)", teams)
con.commit()
res = cur.execute("""
SELECT team_number, min(team_score), max(team_score), avg(team_score)
FROM t
GROUP BY team_number""")
print "team_number, min, max, avg"
for row in res:
print row
Output:
team_number, min, max, avg
(1406, 113, 126, 119.5)
(5713, 69, 144, 106.5)
(7319, 54, 90, 72.0)

Resources